AW: [EpiData-list] Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Sun Apr 29 12:57:36 CDT 2007


This is an elegant solution.
Is there a way to make sure that the data set is sorted to be sure that 
"* We assume that the order of visits in set12 is chronological, but wish to have an identifier:"
holds true
cheers
marcel

***********
Marcel Zwahlen, PhD
Department of Social and Preventive Medicine
University Berne


-----Ursprüngliche Nachricht-----
Von: epidata-list at lists.umanitoba.ca [mailto:epidata-list at lists.umanitoba.ca] 
Gesendet: Sonntag, 29. April 2007 09:12
An: epidata-list at lists.umanitoba.ca
Betreff: [EpiData-list] Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion

Hans wishes to get from
SET12.REC  with variables:  ID AGE SEX VAR1 VAR2
- where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13
VAR23 VAR14 VAR24
where each person (ID) have one record

This conversion can be named "From long to wide format" and is a conversion we often wish to work on:
In the example it is quite easy, since it seems that there are exactly four visits per person.

One way of doing this is:   (principle must be tested on actual data)
read set12.rec
freq id        // here I would look to make sure that all id's actually 
had four records

* We assume that the order of visits in set12 is chronological, but wish to have an identifier:
gen i visit = 1        // integer variable with value 1 for all
if id = id[_n-1] then visit = visit[_n-1] + 1    // now the value of 
visit is correct 1 2 3 4
* the [_n] indicates recnumber and by subtracting one "_n-1" we get the previous record
browse       // here I would look to control that the calculation was 
correct

* generate the 8 variables: (you could also use define followed by let ) gen var11 = -1 ..... etc until gen var24 = -1
* the reason for the value -1 is that we wish to know if the variable is changed later.

Sort id visit
* now we can move the values to the correct variable: (shown for the
var11..var14)
if visit = 1 then var11 = var1
if visit = 1 then var12 = var1[_n+1]
if visit = 1 then var13 = var1[_n+2]
if visit = 1 then var14 = var1[_n+3]
* and similarly for the var2x variables
browse   // to see visually that it works

* all values have now been copied from four records to the first record for each person select visit = 1 labeldata "My combined file in wide format"
savedata newfile

The principle above can be used for many conversions.  E.q. for first and follow up visit:
* assume bmi indicates body mass index. And we wish to test for bmi change by sex:
gen bmidif = .
if (visit = 1) and (id = id[_n+1]) then  bmidif = bmi[_n+1] - bmi
means bmidif sex               // for testing in version 2 add /t
 
The next question could be "How to get from wide to long format" - The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400  // an empty file first, e.g. 100 observations with four records
gen i s = _n       // instead of _n recnumber can be used
gen i id = 1
if (s mod 4) = 0 and (recnumber > 4) then id = id[_n-4] + 1
* plus some more logic to get the id numbers right.
* then the data could be added by "relate" and subsequently moved to the correct variables

regards

Jens Lauritsen





More information about the EpiData-list mailing list