FileMaker FUNdamentals: Parse the FileMaker, Please
by Geoff Wells <geoff@filemakermagazine.com>

RATING: Intermediate
PLATFORM: Macintosh & Windows
VERSION: FileMaker 4/5
BONUS FILE: Parse.FP5, FixedSample.txt

In computer terminology, to "parse" a block of text means to look through it and separate it into chunks of useful strings.

FileMaker includes a rich set of text functions that we can use to look for and break out paragraphs, words and letters. There are many reasons we need to parse a block of text. A few examples are:

a) Separating the field data from the reply strings sent to the database from web pages that use a CGI instead of the Web Companion.

b) A letter writing engine that includes boilerplate text and style tags.

c) Demonstration record data used to populate a new database or restore it to a default condition.

d) Reading legacy data from mainframe systems that has been saved in a fixed length format.

Parsing also refers to the action of separating something like a full name field into its component parts. This can get very complex because you are dealing with strings of words with various lengths and components. Consider a file that contains the names Homer Simpson and Thurston J. Howell III. How do you parse these out to the 5 fields Honorific, First Name, Middle Name, Last Name, Suffix.

For this month's column we'll stick to the simple types that contain delimiters.

Delimiters

A delimiter can be any character or string of characters that will not be included in the result. You are probably familiar with comma and tab delimited files but if you're creating your own format you can use pretty much what you like.

In the case of fixed length files, the length itself is the delimiter. This type of file contains data blocks that have a set number of positions within each record. All you have to do is place the trimmed data blocks in the appropriate fields.

Parsing Fixed Length Fields

Separating out this type of file is very easy. You know beforehand which fields are contained in the string and how many character positions are assigned to each field. Each record is separated by a carriage return, so you can import the file into a single text field and FileMaker will create a new record for each record in the imported file. These records will then become the current found set so your script can continue and use a series of calculated replace steps to populate the fields with data parsed out using the left and middle functions.

If you import the file into a text field cleverly called "ImportedText" you will create a new record for each return separated line in the file. The contents of three ImportedText fields will look something like this:

DDDDDDxxxxDDDxxxDDDDDxxxDDDDxxDDDDDDDDxxxx
DDDDxxxxxxDDDDxxDDDDDDxxDDDDDDDDDDDDDxxxxx
DDDDDDDxxxDDDDDxDDDDxxxxDDDxxxDDDDDDxxxxxx

"D" represents Data "x" represents a space. The example shows three records from the file. Each record contains five data blocks. The blocks contain 10, 6, 8, 6, and 12 characters. We only want to put the data into the fields and not the spaces so we will use the "Trim" function to strip off the unwanted data. Our script will use the "Left" and "Middle" functions together with the "Replace" function to grab each block and put it in the appropriate field. A regular replace puts the same data in each designated field of each record of the found set but a calculated replace treats each record individually. Don't forget that when using the replace function the target field must be on the current layout.

Script: Parse Fixed Length Fields

Enter Browse Mode

Import Records [ Filename: "FixedSample.txt"; Import Order: ImportedText (Text) ]
[ Restore import order, No dialog ]

Replace [ FieldOne , Replace data: Calculation: , Trim(Left(ImportedText, 10)) ]
[ No dialog ]

Replace [ FieldTwo , Replace data: Calculation: , Trim(Middle(ImportedText, 11, 6)) ]
[ No dialog ]

Replace [ FieldThree , Replace data: Calculation: , Trim(Middle(ImportedText, 17, 8)) ]
[ No dialog ]

Replace [ FieldFour , Replace data: Calculation: , Trim(Middle(ImportedText, 25, 6)) ]
    [ No dialog ]

Replace [ FieldFive , Replace data: Calculation: , Trim(Middle(ImportedText, 31, 12)) ]
    [ No dialog ]

Click the [Run] button on the first layout of the example file to see this script in action.

Parsing Variable Length Field Records

Parsing variable length fields is obviously more difficult because you don't know where one field ends and another begins. If you know that the records you are importing have data in each field and the number of words in each field is consistent you can use the "words" functions to break them into its consistent parts. The word functions, of course, are using the space character as a delimiter.

I recently had a situation where I needed to parse out the contents of a value list. The value list was created using the "Use values from field" option. This meant I had no knowledge of the contents of the list. It could be of any length and each entry could contain any number of words.

You will find this technique useful in many similar situations where you need to separate each line of text from a paragraph.

The "Design" function "ValueListItems (dbname, valuelist)" is used to put the contents of the value list into a text global. You then use the Middle function - Middle (text, start, size) - in a loop to separate each line of text. The delimiter for each line is the invisible carriage return character "¶". If you visualize the continuous string of characters that is actually stored it will be easier to see how this algorithm works.

"Text of line one¶Text of line two¶Text of line three¶Text of line four"

    Author's note: In reality each character is stored as an ASCII coded number but just try to work with me on this one.

As you can see, except for the first and last line, each line is surrounded by a carriage return. We can use the "Position" function to look for each carriage return and use that value in the "start" parameter of the Middle function. Because we will use a loop to parse out each line of the paragraph the script steps have to be the same for each line. This means we must adjust the contents of the global containing the text paragraph. If the global is called TheValueList then we use Set TheValueList to "¶" & TheValueList & "¶" which would give us:

"¶Text of line one¶Text of line two¶Text of line three¶Text of line four¶"

Let's take a look at the script and see how it works.

Script: Parse Paragraph

1) Set Field [ DisplayRun, 0 ]

2) Set Field [ DisplayContinue, 1]

3) # "Get the contents of the value list"

4) Set Field [TheValueList, ValueListItems("ParseList.fp5", "Colors")]

5) # "Count the number of return characters and add 1 to get the number of lines"

6) Set Field [ NumberOfLines, PatternCount(TheValueList, "¶") + 1]

7) # "Add delimeters at each end to make all the lines consistent"

8) Set Field [ TheValueList, "¶" & TheValueList & "¶" ]

9) # "Set the variables to default values"

10) Set Field [ StartPosition, 0 ]

11) Set Field [ EndPosition, 0 ]

12) Set Field [ LoopCounter, 0 ]

13) Loop

14)    # "Increment the loop counter"

15)    Set Field [ LoopCounter, LoopCounter + 1 ]

16)    # "Each time through the loop the StartPosition is moved to the next delimiter"

17)    Set Field [ StartPosition, Position(TheValueList, "¶", StartPosition, 1) ]

18)    # "Start one position past the start character and look for the next delimiter"

19)    Set Field [ EndPosition, Position(TheValueList, "¶", StartPosition + 1, 1) ]

20)    # "Grab the text between the delimeters"

21)    Set Field [ LineText, Middle(TheValueList, StartPosition + 1, (EndPosition - StartPosition) - 1) ]

22)    # "Stop each time through the loop to check the results"

23)    # "Replace the Pause with the steps to process your results"

24)    Pause/Resume Script [ ]

25)    # "Exit the loop if we are at the last item of the list "

26)    Exit Loop If [ LoopCounter = NumberOfLines ]

27)    # "If not set the new start to the old end and loop again"

28)    Set Field [ StartPosition, EndPosition ]

29) End Loop

30) Perform Script ["Reset"]
[Sub-scripts]

I've numbered each line to make it easier to identify each line. The first two lines set the globals that control the portals surrounding the [Run] and [Continue] buttons. This is what makes them appear and dissappear.

Line 4 uses the design function to get the contents of the value list and put it into the "TheValueList" global.

Line 6 counts the number of carriage returns in the file to get the number of lines. Since the last line of the file doesn't have a carriage return, we add 1 to the result.

Line 8 adds the delimeters as I explained above and lines 10-12 zero the variables we'll use in the loop below.

The rest of the script is in a loop which runs for each line of the paragraph. We begin by incrementing the loop counter and in line 17 setting the variable "StartPosition" to the first occurrence of a carriage return beyond the current StartPosition.

In line 19 we start 1 position beyond the current StartPosition and set the variable EndPosition to the next occurrence of a carriage return.

Line 21 uses the variables we just calculated in the Middle function. Middle (text, start, size) - text is the string in TheValusList. Start at one position past the StartPosition which is actually the position of the carriage return. Size is the number of characters to copy or the difference between the StartPosition and the EndPosition - less the carriage return.

We pause in line 24 so you can see the results on screen. In a real situation this is where you would process the text string.

Next we check to see if the list is complete and if not the old EndPosition becomes the new StartPosition and we run through the loop again.

The example uses one word colors; try changing one of them to "Purple Polka Dots" and see what happens.

Conclusion

Using text functions to parse data is a very powerful technique familiar to all FileMaker developer veterans. Above I have used Scripts to parse data from fixed length fields and variable length fields. At first the length of the scripts and the calculations used may seem daunting. I encourage you to follow the logic of the above scripts while running them in the included technique file. With a little imagination, you will easily be able to harness the power of parsing data via text functions in each of your databases.

If you have any ideas for topics you would like me to cover in future articles, I would love to hear from you.

Happy FileMaking!

Geoff Wells is the author of the shareware program MIXOLOGY available from <http://fmfiles.com/newfiles>. Search the database of close to 1500 drink recipes by name, liquor or mix. Geoff is a Senior Editor at ISO FileMaker Magazine. Email him at geoff@filemakermagazine.com.