Getting values from other records (was: create reduced dataset with EpiData Analysis): long to wide conversion
Hans wishes to get from SET12.REC with variables: ID AGE SEX VAR1 VAR2  where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13 VAR23 VAR14 VAR24 where each person (ID) have one record
This conversion can be named "From long to wide format" and is a conversion we often wish to work on: In the example it is quite easy, since it seems that there are exactly four visits per person.
One way of doing this is: (principle must be tested on actual data) read set12.rec freq id // here I would look to make sure that all id's actually had four records
* We assume that the order of visits in set12 is chronological, but wish to have an identifier: gen i visit = 1 // integer variable with value 1 for all if id = id[_n1] then visit = visit[_n1] + 1 // now the value of visit is correct 1 2 3 4 * the [_n] indicates recnumber and by subtracting one "_n1" we get the previous record browse // here I would look to control that the calculation was correct
* generate the 8 variables: (you could also use define followed by let ) gen var11 = 1 ..... etc until gen var24 = 1 * the reason for the value 1 is that we wish to know if the variable is changed later.
Sort id visit * now we can move the values to the correct variable: (shown for the var11..var14) if visit = 1 then var11 = var1 if visit = 1 then var12 = var1[_n+1] if visit = 1 then var13 = var1[_n+2] if visit = 1 then var14 = var1[_n+3] * and similarly for the var2x variables browse // to see visually that it works
* all values have now been copied from four records to the first record for each person select visit = 1 labeldata "My combined file in wide format" savedata newfile
The principle above can be used for many conversions. E.q. for first and follow up visit: * assume bmi indicates body mass index. And we wish to test for bmi change by sex: gen bmidif = . if (visit = 1) and (id = id[_n+1]) then bmidif = bmi[_n+1]  bmi means bmidif sex // for testing in version 2 add /t
The next question could be "How to get from wide to long format"  The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400 // an empty file first, e.g. 100 observations with four records gen i s = _n // instead of _n recnumber can be used gen i id = 1 if (s mod 4) = 0 and (recnumber > 4) then id = id[_n4] + 1 * plus some more logic to get the id numbers right. * then the data could be added by "relate" and subsequently moved to the correct variables
regards
Jens Lauritsen
This is an elegant solution. Is there a way to make sure that the data set is sorted to be sure that "* We assume that the order of visits in set12 is chronological, but wish to have an identifier:" holds true cheers marcel
*********** Marcel Zwahlen, PhD Department of Social and Preventive Medicine University Berne
Ursprüngliche Nachricht Von: epidatalist@lists.umanitoba.ca [mailto:epidatalist@lists.umanitoba.ca] Gesendet: Sonntag, 29. April 2007 09:12 An: epidatalist@lists.umanitoba.ca Betreff: [EpiDatalist] Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion
Hans wishes to get from SET12.REC with variables: ID AGE SEX VAR1 VAR2  where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13 VAR23 VAR14 VAR24 where each person (ID) have one record
This conversion can be named "From long to wide format" and is a conversion we often wish to work on: In the example it is quite easy, since it seems that there are exactly four visits per person.
One way of doing this is: (principle must be tested on actual data) read set12.rec freq id // here I would look to make sure that all id's actually had four records
* We assume that the order of visits in set12 is chronological, but wish to have an identifier: gen i visit = 1 // integer variable with value 1 for all if id = id[_n1] then visit = visit[_n1] + 1 // now the value of visit is correct 1 2 3 4 * the [_n] indicates recnumber and by subtracting one "_n1" we get the previous record browse // here I would look to control that the calculation was correct
* generate the 8 variables: (you could also use define followed by let ) gen var11 = 1 ..... etc until gen var24 = 1 * the reason for the value 1 is that we wish to know if the variable is changed later.
Sort id visit * now we can move the values to the correct variable: (shown for the var11..var14) if visit = 1 then var11 = var1 if visit = 1 then var12 = var1[_n+1] if visit = 1 then var13 = var1[_n+2] if visit = 1 then var14 = var1[_n+3] * and similarly for the var2x variables browse // to see visually that it works
* all values have now been copied from four records to the first record for each person select visit = 1 labeldata "My combined file in wide format" savedata newfile
The principle above can be used for many conversions. E.q. for first and follow up visit: * assume bmi indicates body mass index. And we wish to test for bmi change by sex: gen bmidif = . if (visit = 1) and (id = id[_n+1]) then bmidif = bmi[_n+1]  bmi means bmidif sex // for testing in version 2 add /t
The next question could be "How to get from wide to long format"  The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400 // an empty file first, e.g. 100 observations with four records gen i s = _n // instead of _n recnumber can be used gen i id = 1 if (s mod 4) = 0 and (recnumber > 4) then id = id[_n4] + 1 * plus some more logic to get the id numbers right. * then the data could be added by "relate" and subsequently moved to the correct variables
regards
Jens Lauritsen
Dear Jens:
As Marcel says, this is a very elegant solution, and I greatly appreciate your help. I just went through the entire exercise that had made me ask this in the first place:
1. From Excel with several rows per individual (Long) to two related EpiData files (PARENT and CHILD) 2. Making a single EpiData file from PARENT and CHILD 3. Making a Longtowide from the above.
All works fine and perfect. There is no problem in the order, Marcel, it works, however you want it.
Thanks again a lot, Jens. This is a common problem and a wonderful solution.
Hans
epidatalist@lists.umanitoba.ca wrote:
This is an elegant solution. Is there a way to make sure that the data set is sorted to be sure that "* We assume that the order of visits in set12 is chronological, but wish to have an identifier:" holds true cheers marcel
Marcel Zwahlen, PhD Department of Social and Preventive Medicine University Berne
Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion
Hans wishes to get from SET12.REC with variables: ID AGE SEX VAR1 VAR2
 where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13
VAR23 VAR14 VAR24 where each person (ID) have one record
This conversion can be named "From long to wide format" and is a conversion we often wish to work on: In the example it is quite easy, since it seems that there are exactly four visits per person.
One way of doing this is: (principle must be tested on actual data) read set12.rec freq id // here I would look to make sure that all id's actually had four records
 We assume that the order of visits in set12 is chronological, but wish to have an identifier:
gen i visit = // integer variable with value 1 for all if id =d[_n1] then visit = visit[_n1] + 1 // now the value of visit is correct 1 2 3 4
 the [_n] indicates recnumber and by subtracting one "_n1" we get the previous record
browse // here I would look to control that the calculation was correct
 generate the 8 variables: (you could also use define followed by let ) gen var11 =1 ..... etc until gen var24 = 1
 the reason for the value 1 is that we wish to know if the variable is changed later.
Sort id visit
 now we can move the values to the correct variable: (shown for the
var11..var14) if visit = then var11 = var1 if visit = then var12 = var1[_n+1] if visit = then var13 = var1[_n+2] if visit = then var14 = var1[_n+3]
 and similarly for the var2x variables
browse // to see visually that it works
 all values have now been copied from four records to the first record for each person select visit = labeldata "My combined file in wide format"
savedata newfile
The principle above can be used for many conversions. E.q. for first and follow up visit:
 assume bmi indicates body mass index. And we wish to test for bmi change by sex:
gen bmidif = if (visit =) and (id = id[_n+1]) then bmidif = bmi[_n+1]  bmi means bmidif sex // for testing in version 2 add /t
The next question could be "How to get from wide to long format"  The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400 // an empty file first, e.g. 100 observations with four records gen i s =n // instead of _n recnumber can be used gen i id = if (s mod 4) = and (recnumber > 4) then id = id[_n4] + 1
 plus some more logic to get the id numbers right.
 then the data could be added by "relate" and subsequently moved to the correct variables
regards
Jens Lauritsen
participants (1)

epidatalist＠lists.umanitoba.ca