[EpiData-list] Re: long to wide conversion

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Mon Apr 30 16:19:49 CDT 2007

Dear Jens:

As Marcel says, this is a very elegant solution, and I greatly 
appreciate your help.  I just went through the entire exercise that had 
made me ask this in the first place:

1. From Excel with several rows per individual (Long) to two related 
EpiData files (PARENT and CHILD)
2. Making a single EpiData file from PARENT and CHILD
3. Making a Long-to-wide from the above.

All works fine and perfect.  There is no problem in the order, Marcel, 
it works, however you want it.

Thanks again a lot, Jens.  This is a common problem and a wonderful 


epidata-list at lists.umanitoba.ca wrote:
> This is an elegant solution.
> Is there a way to make sure that the data set is sorted to be sure that 
> "* We assume that the order of visits in set12 is chronological, but wish to have an identifier:"
> holds true
> cheers
> marcel
> ***********
> Marcel Zwahlen, PhD
> Department of Social and Preventive Medicine
> University Berne
> Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion
> Hans wishes to get from
> SET12.REC  with variables:  ID AGE SEX VAR1 VAR2
> - where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13
> VAR23 VAR14 VAR24
> where each person (ID) have one record
> This conversion can be named "From long to wide format" and is a conversion we often wish to work on:
> In the example it is quite easy, since it seems that there are exactly four visits per person.
> One way of doing this is:   (principle must be tested on actual data)
> read set12.rec
> freq id        // here I would look to make sure that all id's actually 
> had four records
> * We assume that the order of visits in set12 is chronological, but wish to have an identifier:
> gen i visit =        // integer variable with value 1 for all
> if id =d[_n-1] then visit = visit[_n-1] + 1    // now the value of 
> visit is correct 1 2 3 4
> * the [_n] indicates recnumber and by subtracting one "_n-1" we get the previous record
> browse       // here I would look to control that the calculation was 
> correct
> * generate the 8 variables: (you could also use define followed by let ) gen var11 =1 ..... etc until gen var24 = -1
> * the reason for the value -1 is that we wish to know if the variable is changed later.
> Sort id visit
> * now we can move the values to the correct variable: (shown for the
> var11..var14)
> if visit = then var11 = var1
> if visit = then var12 = var1[_n+1]
> if visit = then var13 = var1[_n+2]
> if visit = then var14 = var1[_n+3]
> * and similarly for the var2x variables
> browse   // to see visually that it works
> * all values have now been copied from four records to the first record for each person select visit = labeldata "My combined file in wide format"
> savedata newfile
> The principle above can be used for many conversions.  E.q. for first and follow up visit:
> * assume bmi indicates body mass index. And we wish to test for bmi change by sex:
> gen bmidif =
> if (visit =) and (id = id[_n+1]) then  bmidif = bmi[_n+1] - bmi
> means bmidif sex               // for testing in version 2 add /t
> The next question could be "How to get from wide to long format" - The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400  // an empty file first, e.g. 100 observations with four records
> gen i s =n       // instead of _n recnumber can be used
> gen i id =
> if (s mod 4) = and (recnumber > 4) then id = id[_n-4] + 1
> * plus some more logic to get the id numbers right.
> * then the data could be added by "relate" and subsequently moved to the correct variables
> regards
> Jens Lauritsen
Hans L Rieder, MD, MPH
Jetzikofenstr. 12
3038 Kirchlindach

Tel: +41 31 829 4577
Mob: +41 79 321 9122
Web: http://www.tbrieder.org

More information about the EpiData-list mailing list