by Ray M. Thompson <raymack@onramp.net>
Converting Your Flat file Databases
Like many of you, I am really enjoying FileMaker 3.0's new relational capabilities. I have converted several simple flat-file databases and have found setting up the required relationships for displaying the related fields is quite easy to do. However, I have discovered that getting the original database ready for conversion is important, and a little more involved that I first thought. The following are some of the things I have learned - and a general technique that should work with any relational database. No big "breakthrough" here, just what I have learned from actually making several conversions.
The larger the database you are converting, the more worthwhile these techniques will be. You will need to be fairly familiar with FM scripting to follow the following explanation.
To explain what I have learned I will use a simple example that you can relate to your own needs. Records in my Contacts.FDB (flat file database) contain unique data in the Contacts data fields, but many of the Company data fields are duplicated since many of the contacts work for the same companies.
Fields used are:
Contacts Data Fields
Company Data Fields
Contact Name
Company Name
Contact Phone(s)
Company Address
Company City/Company State/Company ZIP
Company Phone(s)
I wanted to switch to a relational database, keeping the Contacts data fields in one file and Company data fields in a related file - a many-to-one relationship.
The problem, which quickly presents itself, is that many of the Company fields are for the same company, but the data fields do not match! This is simply the results of many different people entering data from many different sources. There will likely be a wide range of such differences, but the easiest place to resolve them is in the original flat file. After that, you are almost home free when the time comes to create the relational files.
To find and identify all records with duplicate company data I have found the COMPANY NAME and COMPANY ADDRESS fields to be the key. Several scripts can be used to automate this task.
(Note: In the following procedure I am ignoring records that are duplicates because the company has changed addresses. Obviously these have to be dealt with as part of data maintenance. Looking at a Date Entered field may be helpful.)
As always, work on a copy of your data.
Work in the Contacts Flat File
Creating Duping Layout and Fields
1. Create a new layout for use in the following duping routines. Fix a list view with a MARK RECORD field, and the COMPANY NAME and COMPANY ADDRESS fields.
2. Create a new calculated field: COMPANY NAME + COMPANY ADDRESS = NAMEADDRESS. (This field will be used by our scripts, but is not needed on the duping layout.)
[Ed - the process of adding two pieces of text information together, using a calculation, is done with the ampersand character "&". Example: FirstName & " " & LastName - will make the output look like a full name. You are concatenating the first name to the last name plus a space character in between the first and last.]
Creating Duping Scripts
Create the four scripts described below to aid our duping process:
SCRIPT A - This script gets rid of those bad characters, the "return" which you will probably find all too often in both the COMPANY NAME and COMPANY ADDRESS fields. (To create this find, open your word processor, type a single return, select all, copy the invisible return character to your database find field and enclose it in quote marks.)
[Ed-In FileMaker it is the "¶" character that is the return value. Example: You want to concatenate two pieces of text together with a return between them. FirstName & "¶" & LastName - will give you the first name on one line and the last name on a second line. However, in this example Ray is stripping out the carriage returns]
SCRIPT B - This script sorts Contacts.FDB by NAMEADDRESS and then uses a loop and global variable to compare each record, mark duplicates, and then find the marked records. (You will find such a script on page 5-32 of the FM manual.) Running this script will identify and mark records with Company NAMEADDRESS data that are an exact match.
SCRIPT C - This script USES the FM standard find dupes in the NAMEADDRESS field, using the "!" character. Running this script and then sorting on NAMEADDRESS will display a lot more records that are really the same company but were not picked up by Script B because they are not exact matches. Differences will be in spelling, spacing and punctuation, which can be edited out.
SCRIPT D - Duplicate Script C and change it to find/sort dups in the ADDRESS field only. The purpose of this script is a little different; it will find a number of records that are not duplicates! For example, the same address for different companies housed in the same building.
Put buttons for these four scripts on the duping layout.
The Duping Process
1. Duplicate Contacts.FDB at this point and name it Contacts.FBU1our first backup file just in case things go wrong!
2. Now clone Contacts.FDB to create an empty file; name it Archive.TDB (this file will be used as a temporary database file only).
3. Go to the duping layout.
4. Run Script A to find records that contains a Return character. Click in each field of all records in the found set to check for the troublesome character; the field box will drop down when you insert the cursor. Edit to remove all of them.
5. Run Script B to automatically find and mark all records with identical NAMEADDRESS fields.
6. Import the duplicate records into Contacts.FDB into Archive.TDB, then delete them from Contacts.FDB.
7. Run Script C to find records that are really the same company/address but were not picked up by Script B because they are not exact matches.
8. Go through the found set of records and edit out differences in both the NAME and ADDRESS fields. You can use cut and paste to quickly make these fields identical. (A chore, but better done now than later!)
9. Now run Script B again, and again import duplicate records Archive.TDB, deleting them from Contacts.FDB.
10. Run Script D to find duplicates in the ADDRESS field only. We are now down to the last bit of editing. Remember, in this instance some of the records will not be duplicates. Again, edit out differences for those records that are duplicates. Obviously, if there is some doubt about records being for the same company, don't change them!
11. Finally, import all records in Archive.TDB back into ContactsFDB. Note: you should have the same number of records that you started with. (Unless perhaps you happened to find duplicate contact names and deleted them).
Thus far we have accomplished two important things:
Systematically and carefully identified and resolved most Company data duplicates. (Well, not all of them, as we shall see.....)
Prepared a new field, NAMEADDRESS, that can be used as an temporary "key field" to relate our new files.
Creating the Relational Files
Duplicate the edited Contacts.FDB file twice and name the tree files as follows:
Contacts.FBU2 ( our second backup file)
Contacts.RDB (our new relational Contacts file)
Company.RDB (our new relational Company file)
Work in the Company.RDB File
In this new file, do the following:
1. Create a new numeric field: #KEYCOMPANYRDB.
2. We now have another problem to deal with before we finally shave this file down to a unique set of Company records. Note that up to now we have looked only at the company NAME and ADDRESS fields. Even if these fields now match exactly between two or more records, there may be differences between the matched records in other data fields. A good example is the Company PHONE(s) fields; phone data may be different or may be missing. Ideally, we need to identify the record in each matched set that has the correct and most complete data, and delete the others. (A messy little task, and I leave you decide how important this final edit is to your data.)
4. After resolving this problem, delete duplicates from this file, leaving a unique set of Company records.
5. Sort the file by company NAME.
6. Go to the #KEYCOMPANYRDB field and fill with serial no.'s using the Replace command.
Work in the Contacts.RDB File
In this new file, do the following:
1. Create a new numeric field: #KEYCONTACTSRDB and add to the layout.
2. Set up two relationships with Company.RDB:
A. ContactsCompany: #KEYCONTACTSRDB = #KEYCOMPANYRDB.
B. ContactsCompanyTEMP: NAMEADDRESS = NAMEADDRESS.
3. Duplicate an existing layout that displays all of the original Contacts and Company data fields. Add #KEYCONTACTSRDB to this layout
4. Now add all Company data fields from Company.RDB via relationship A (A many-to-one relationship, so a "portal" is not required.)
5. Next, define #KEYCONTACTSRDB as a lookup field, based on relationship B; select "If not found do nothing".
6. Finally, the "magic step"...click in the lookup field #KEYCONTACTSRDB and do a Relookup. If everything has been setup correctly, two actions take place, almost simultaneously:
A. The correct key field serial no. from the Company.RDB will appear in the key field of Contacts.RDB.
B. The Contacts Key Field will then bring in the correct Company data for each Contacts record. Check this by scrolling through a number of records and comparing the "old" Company data fields with the "new" Company data fields from the related file.
7. Do a Find for blanks in the #KEYCONTACTSRDB to see if there are any records did not match. Hopefully there will be very few, which you can fix manually.
[Ed-For those of you who are still 2.1 users and didn't understand a lot of what was covered here they it would do you some good to take a look at the relational areas of the FileMaker manual. If you need even more help understanding relational capabilities then there are many good books on the shelves of your local books store about FileMaker 3. The technique here was really quite simple. In a nutshell the one file that you had was copied so that you had two files. The duplicates were then taken out and the data was matched up between the two files of Company and Contacts. This process will vary depending on your situation and your database. Don't become discouraged if you cannot get this process to work for your situation. Sometimes the best way to learn is to experiment - and give yourself the time to do it with becoming frustrated. Always work on a copy because FileMaker will save the changes after you make them.]
## END ##