[EpiData-list] How to: Select observations based on string variables

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Fri Nov 18 02:10:50 CST 2005


To work efficiently with string variables a few techniques are needed:

To work with string variables some functions are needed:
Length of a string:    length("Abcde") = 5
Convert to upper:    upper("Abcde") = "ABCDE"
Remove blanks:       trim("Abcde ") = "Abcde"
Look for a given text in a string:  pos("Abcde","cd") = 3

A. How do I select those with empty contents of a variable S (string).

In person language this would mean:
1. Remove all blanks/spaces from my variable
2. Is the trimmed length of the variable greater than zero ?
   Yes: The string is NOT empty
   No: The string is empty

So in analysis language this is:
1:  trim(S)
2: length(trim(S))
and to use this as select :
        "YES" from above:  Select if (length(trim(S)) > 0)
        "NO" from above:  Select if  (length(trim(S)) = 0)

alternative:
        "YES" from above:  Select if ( trim(S) <> "")
        "NO" from above:  Select if  ( trim(S) = "")

The user will notice the many parenthesis ( ). The reason I add an outer 
parenthesis is
to be sure that the whole sentence is handled correctly.

B. How to select observations with a certain string value:

In person language this would mean:
1. Make sure not to confuse lower and upper case entry of the text
2. Identify the string I wish to search on
3. Does the variable contain the string  ?

And in analysis language this is:
1:  upper(S)
2: e.g. "DOG"
3: pos(upper(S),"DOG")
and to use this as select :
        "YES" from above:  Select if (pos(upper(S),"DOG") > 0 )
        "NO" from above:  Select if  (pos(upper(S),"DOG") = 0 )


C. Complicated string values:
Assume you wish to select all text strings with content DOG, but not if 
these also contained the word "WHITE"
- that is find all dogs, but exclude the white ones.
  Select if  ( (pos(upper(S),"DOG") > 0 ) and (pos(upper(S),"WHITE") = 0 ))

If you wished to find all DOGs or CATs:
  Select if  ( (pos(upper(S),"DOG") > 0 ) or (pos(upper(S),"CAT") = 0 ) )

It is often difficult to get the correct number of parenthesis. But 
start with the innermost one and build
from there. Handling all types of strings with different national 
characters is difficult, therefore
it is good practice to use upper() and many parenthesis to assist the 
programme in working correctly.

Jens Lauritsen
EpiData Association



More information about the EpiData-list mailing list