[EpiData-list] Random selection of records

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Tue Mar 4 16:47:09 CST 2008

This is an excellent question!

1. For random sampling, say you want to get 20% of the records (1 in 5). 
With a large number of records, do this:

gen i pick =ran(5) // pick will be integer of values 0 to 4
select pick=0 // choose those with pick=0 (approx 1 in 5)
savedata sample

You will not always get one fifth of the records, since pick is 
determined at random.

For systematic sampling (every fifth record) do this:
gen i pick = recnumber - 5*(recnumber div 5) // effectively pick = 
recnumber mod 5
select pick = 0 // or 1,2,3,4 - it's your choice
savedata sample

The best way to randomly sample records, especially when there are not a 
lot of records (<500 say) is this:

gen pick = rnd(1) // pick will be float between 0 and 1
sort pick
describe pick /q // this is only done to get the number of records into 
select recnumber <= ($obs1 div 5) // select the first fifth of records 
after sorting
savedata sample

If you are just typing in the commands, there is no need to do the 
describe command. You will know how many records to select after the sort.

I encourage everyone to explore the results variables. After any freq, 
tab, means or other analytic command, type

var result

to see what results variables are created temporarily. These are VERY 
useful. All of the functions are described in the HELP that comes with 
Analysis. Jens and I spent some time getting the help file into shape, 
since this is your key technical reference for Analysis.

This works because recnumber is always the record number in the current 
record set and in its current order, after any sort and select.

2. In Analysis, LIST will show record numbers unless you include /no on 
the command line - e.g.

LIST a b c d e /no

Shavinder wrote:
> Dear Jamie,
>                        Thank you so much for the solution for the query “INCOMPATIBLE KEY VARIABLES”. It worked well.
>  Can you please help in two other situation also ?
> 1. There is a file say “try.rec”  Is it possible to read it and write another file with 5 randomly selected records or selecting every 4th record (as done in systematic sampling). The documentation of the commands such as SET RANDOM SEED and RANDOM SIMULATIONS is not clear in the help file. I am attaching a sample file try.zip.
> 2.What is the equivalent of SET LISTREC=ON/OFF (EPI6) in Epi Analysis ver 2.03 ?

More information about the EpiData-list mailing list