This is an excellent question!
1. For random sampling, say you want to get 20% of the records (1 in 5). With a large number of records, do this:
gen i pick =ran(5) // pick will be integer of values 0 to 4 select pick=0 // choose those with pick=0 (approx 1 in 5) savedata sample
You will not always get one fifth of the records, since pick is determined at random.
For systematic sampling (every fifth record) do this: gen i pick = recnumber - 5*(recnumber div 5) // effectively pick = recnumber mod 5 select pick = 0 // or 1,2,3,4 - it's your choice savedata sample
The best way to randomly sample records, especially when there are not a lot of records (<500 say) is this:
gen pick = rnd(1) // pick will be float between 0 and 1 sort pick describe pick /q // this is only done to get the number of records into $obs1 select recnumber <= ($obs1 div 5) // select the first fifth of records after sorting savedata sample
If you are just typing in the commands, there is no need to do the describe command. You will know how many records to select after the sort.
I encourage everyone to explore the results variables. After any freq, tab, means or other analytic command, type
var result
to see what results variables are created temporarily. These are VERY useful. All of the functions are described in the HELP that comes with Analysis. Jens and I spent some time getting the help file into shape, since this is your key technical reference for Analysis.
This works because recnumber is always the record number in the current record set and in its current order, after any sort and select.
2. In Analysis, LIST will show record numbers unless you include /no on the command line - e.g.
LIST a b c d e /no
Jamie Shavinder wrote:
Dear Jamie, Thank you so much for the solution for the query “INCOMPATIBLE KEY VARIABLES”. It worked well.
Can you please help in two other situation also ?
- There is a file say “try.rec” Is it possible to read it and write another file with 5 randomly selected records or selecting every 4th record (as done in systematic sampling). The documentation of the commands such as SET RANDOM SEED and RANDOM SIMULATIONS is not clear in the help file. I am attaching a sample file try.zip.
2.What is the equivalent of SET LISTREC=ON/OFF (EPI6) in Epi Analysis ver 2.03 ?