A common issue in all databases is dealing with duplicate records. In FileMaker I tend to use two different methods to identify duplicate records. Each serve their own purpose. One method uses a self-join relationship with a calculation that marks the duplicate records, while the other method uses a script that sort and loops through records.
On the surface, both methods are fairly simple, such that when memorized will come to your aid time and time again. If you maintain a database template or a technique file, you can set up both methods and refer to these as needed.
The first method, removing duplicates via a self-join, is drawn directly from FileMaker’s Help manual. It is a technique that has existed for several years, simple and elegant. This method requires two additional fields in your table (or three, if you do not yet have a field with your “unique criteria”). Once you determine which field makes your record unique (which may require a calculation; the example uses name + phone number), you create a self-join relationship. You do this by creating a new table occurrence. I usually call this “MyTableName__Self” (with MyTableName switched to the actual table with duplicate records, of course). Make the relationship key the UniqueField = UniqueField.
With the self join in place, create your two new fields, the Counter and the DuplicateFlag (or whatever you want to call them). Set aside the counter field for now. Make your DuplicateFlag field a calculation. This is a simple IF statement:
If( MyTable::Counter = MyTable__Self::Counter ; "Unique" ; "Duplicate" )
Make the Counter “Auto-Enter Serial”. Then, exit the Manage Database window, and in your layout select the Counter field. With all fields showing, add a serial key by clicking in the field and selecting Records>>Replace Field Contents from your menu. Select Replace with Serial Numbers. All fields should populate, and the DuplicateFlag immediately will update. You then can search for any duplicates. Records update live, so to find new duplicates just run a search.
The second method loops through the found set and checks each record against the previous record. There are more steps involved, but the benefit is that you control what you want to check. This is useful for reports or displaying certain data to a user by finding or omitting records as needed. This method also does not need any schema changes, like the self-join.
Your script will sort the records based on the unique identifier, in this case a phone number. You set a variable to the phone number, then go to the next records in a loop. For each record, you compare the variable to the current record’s phone number. If they match, you set your flag and/or omit the record. If they don’t match, you update the variable and go to the next record. The loop continues until all records in the found set are checked. In this example, I only showed the duplicate record(s) to make sure the flag works. Usually you keep the duplicate omitted.
Both these methods are quick and simple techniques. There is nothing fancy involved, although the concept of “self-join” may sound obscure to beginners. Either will work to clean up duplicate data in a flash.
Michael Larson says
In my experience with this Looping code if, by chance, the final two records of a set are duplicates of each other, they will both be marked as “Duplicate”. Replacing the Omit Record with a Go to Record (Next; Exit after last) resolves this. It is necessary to then remove the Show Omitted.
In addition, I’m unclear of the purpose of the Get (Found Count) in the Loop. It may be useful when using the Omit Record, otherwise the count would never be zero. A similar test would be useful before the loop to test for an empty set.
Thank you Anders,
Your script idea worked perfect for me. I just wanted to count all the records without the duplicated records and I was able to do that with your idea of looping.
Thanks for sharing.