The assignment was simple: A report is not sorting correctly. Fix it. Easy, right?
Little did I know that by taking this assignment I was about to pull out the pebble that would open up the dam. A deluge of trial and error later, the report was fixed, and I was a more knowledgeable developer.
This is a story that happens every day. Somewhere right now there is a developer (maybe you?) drowning in a scenario that is threatening to rob time, money, and reputation from him or her. How do you get through it? What does it look like to work through a problem until it’s solved?
As an example, this post will attempt to describe what it looks like when I brainstorm a problem and try to innovate or at least stretch my own boundaries as a developer. I hope this story inspires you to not give up, but to keep researching, asking, and discovering for yourself. There is much innovation to be had, and you could be the one who discovers the “Next Big Thing” that we talk about at DevCon! Those opportunities come through perseverance through tough problems.
I think often times we (me included) view the more “famous” developers as if they’re immune to working hard or being stumped. The reason they make things look easy is because they don’t give up when things get difficult. In fact, I propose that if you want to innovate, you need to embrace the more difficult problems – invite them, in fact. Taunt them with trash talk before you dive in. The hard problems are fertile ground for the very thing we desire most: innovation. If you decide in advance that you’re going to solve this problem and solve it well, then all that’s left is to do it.
This story is also about being honest about where innovation comes from. Throughout this adventure, I’m going to document where the work of others directly influenced the outcome of this project so you can see that I am only as brilliant as the developer next to me. I’m constantly cheating off someone else’s paper! Except, in the FileMaker community, it’s not cheating. It’s expected. This particular community seems to understand that all human innovation is built on prior art. I for one am not ashamed to say it out loud, so I will always try to happily give credit where it’s due.
Here’s the background to the report I needed to build:
- The information comes from up to six different relationships, each with at least three predicates, the main relationship having four.
- Four of these relationships existed as portals in a tab object. One of these relationships was on a developer layout.
- The other was header information for the parent record on both of these layouts.
- The sort order for the child records comes from two places: one of the fields in the child table, and a related value list with over 200 items.
- The user must be able to sort the parent records in any order, and have the resulting child records underneath them.
- The parent table has over 750,000 parent records.
- The child table has well over 5,000,000 records. (Yes, that’s millions.)
- This is not my solution. It’s been around for a while and I have little say in some things, a lot of say in others.
- I try to avoid things like large indexes if possible.
The first attempt (made by another developer before he wisely handed it off) was with sliding portals in the parent table. The process was roughly:
- Create a layout using the parent table as the source of the layout. (This would be in opposition to the standard report building technique of having the child records as the source of the layout.)
- Add the children via four portals in the body element, underneath the fields from the parent record.
- Make each of the four portals as tall as possible.
- Set the portals and fields to slide up and reduce the size of the enclosing part. (Empty portal rows won’t be shown.)
- Sort the portals as necessary.
This ultimately didn’t work, because something in portal 4 might need to sort above something in portal 1. Even if this was not the case now, the user could change the value list, and it might be the case in the future.
This is where I picked up the project, trashed the portal-based report and started from scratch.
During discovery, I looked for a key/ID field that linked the parent record to all of the child records in the different portals. There was a field that was labeled appropriately enough, but when I threw it out on the layout and compared it to the key fields in the different portals, it only matched some of the line items. It also had a pipe character at the end of it. It looked like this particular key that was supposed to link parent and child records was parsed by some other process, and I didn’t want to break it. So I had no parent/child linking field to work with. It is what it is, and, there was nothing I could do about it.
Defining a new key field over 750,000 records didn’t sound appealing, much less over the 5,000,000 child records. So I started looking for a way to extract the data while keeping the records together and putting the extracted information into a merge table of some kind. After all, the ultimate goal was simply display. All that mattered was that the data was in the right order.
Tap Into DevCon Knowledge
Coming fresh from FileMaker DevCon 2011, I decided to try a virtual list for this report. The routine was something like this:
- Loop though each portal in each record, and place the field values for the report into variables.
- The report itself will be built of empty records. Each record will contain unstored calculations that parse the data out of the different variables.
- Send the variable values from the interface file to the data file were the report resided. This is done by calling a script in the data file with a parameter of the variable that needs to be sent. The script in the data file sets a global variable to the value of Get(ScriptParameter), thus transferring the variable from one file to another. (I learned this trick from Ross Mackintosh here at MightyData and love it!)
- Lastly, find the number of report records equal to the number of values in the variable.
This technique worked over a small found set, but once the combination of parent and child records exceed 1,000, I started to see the CPU usage go way up. This would never work with 100 users in the system, and it would be devastating if two users ran the report at the same time!
Using the script debugger, I started isolating sections of the report script and watching the CPU meters on the server. I ended up finding two problems: 1) simply navigating from record to record and stepping through the portals was costing me dearly and 2) the report was made up of six fields that were all unstored calculations that were trying to find their values within a thousand lines (or more) in each of six variables. That’s a lot of calculating, and a lot of CPU usage.
So, the virtual list idea as a whole was down, but not quite out. I had the follow-up idea that I should just collect the keys (IDs) for the records in variables, instead of all the field values themselves. This would be far less hassle for both data collection and calculating.
First, I tested to make sure that virtual keys were acceptable to FileMaker and reliable. You never know what kind of mess you might step into when you’re working unconventionally. Sure, an unstored calc field might work on one side of a relationship, but that doesn’t mean it’s reliable in this case. Ultimately it was.
Then I built a single key field and related it to the parent and child tables. (How efficient of me, right?) Then I used the relationship to determine if the ID represented a “header” record in the report, which in pseudocode would look like this:
If [ IsValid (HeaderRelationship) ; ChangeFieldColor ; Don’tChangeAnything ]
The script looped through the records, gathered the keys into a variable, passed the variable to the target file, and the field definitions in that file parsed the keys out into their records.
Again, this setup worked mostly. But I was getting some weird results (like header records that were also non-header records), and I still hadn’t found a replacement for looping through records when grabbing all the keys to begin with.
Problems With the Data = Good
Well, this is where I found out that the system is not using universally unique keys. In other words, the IDs were not unique from table to table. So when my key field definition went looking for a match in the parent table, it often found one, even though it was a child key that had been parsed out. That’s what was causing my weird results!
However, this was a wondrous accident. While investigating the key fields, I found that the field in the parent table (that supposedly linked to the child table but didn’t) actually had two values in it, not just one, thus accounting for all the children records! Breakthrough!
What had happened is that when I threw the key field on the layout before, I never resized it, or clicked into it because I could see the full value of it. When I saw the pipe character at the end of it, I recognized that from elsewhere in the system as being a parsed value, rather than a divider between two different values.
When I discovered this, I thought I might be able to just take the parent keys off to the side in my report table and walk through them there. I could now build a relationship to all the child records of a given parent key, and somehow work with that.
So I built the six relationships off of the Report table, and tried again:
- Loop though each parent record, and capture their linking keys into a variable.
- Pass the variable to the report file.
- Loop through the keys one by one, putting the parent and children records together as I go.
Replacing gray schema with the orange schema
This time the problem was that I needed to commit the record each time I put in a new key field so that it would forget the relationship this record had last time and display the new data. I tried building in a constant predicate into the relationship so it would refresh by setting a single field to “1” – which I learned from Todd Geist via his post-DevCon video about Commit Record, which you should see if you haven’t.
Unfortunately, that didn’t help. It seemed to be as slow as replacing the field contents itself. So I found myself somewhat back at square one. I decided to shift my thinking to the CPU usage associated with navigating from record to record in the parent table.
My first thought was to compare looping through a list view instead of looping through form view where the child portals lived. Looping through records in list view did help, but not enough for me to tell the client, “It’s fast.”
Next, I tried blank layouts based on the parent table, so that no other fields but mine were being queried while I navigated record to record. That did help, in fact. I got both an increase in speed, and a decrease in CPU usage. But it was still less than awesome. I knew I could do better.
Call for (Virtual) Backup
At this point it was time to sit back and rethink the problem with a clear head. One family board game and a good night’s sleep later, I decided to take a trip through Google to make sure I wasn’t missing anything related to FileMaker speed. I was looking for two things: collecting data as quickly as possible without looping, and best practices for when you absolutely had to loop.
First, I looked for a custom function for grabbing records. And if you’ve ever looked for a custom function, there’s one place Google always takes you: Brian Dunning’s custom function list. A quick search for GetNthRecord yielded a whole host of solutions looking for a problem. But the one that caught my eye was GetNthRecordSet( ). This function allowed me to grab all the values of a given field in a found set of records without looping. This was perfect.
Next, I stumbled across an article from HOnza Koudelka called FileMaker Script Execution Time Cut From 5 Hours to 6 Seconds. That sounded like he had written that article just for me. The main takeaway from that article for me was this: “…the fix often is as easy as replacing the stupid solution with the normal one.”
While I was all fired up about the virtual list and innovating, I still didn’t have a workable solution. What if all I needed to do, as HOnza suggested, was something more traditional? At this point it was worth a try. I was running out of time!