Getting values from other records (was: create reduced dataset with EpiData Analysis): long to wide conversion
Hans wishes to get from SET12.REC with variables: ID AGE SEX VAR1 VAR2 - where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13 VAR23 VAR14 VAR24 where each person (ID) have one record
This conversion can be named "From long to wide format" and is a conversion we often wish to work on: In the example it is quite easy, since it seems that there are exactly four visits per person.
One way of doing this is: (principle must be tested on actual data) read set12.rec freq id // here I would look to make sure that all id's actually had four records
* We assume that the order of visits in set12 is chronological, but wish to have an identifier: gen i visit = 1 // integer variable with value 1 for all if id = id[_n-1] then visit = visit[_n-1] + 1 // now the value of visit is correct 1 2 3 4 * the [_n] indicates recnumber and by subtracting one "_n-1" we get the previous record browse // here I would look to control that the calculation was correct
* generate the 8 variables: (you could also use define followed by let ) gen var11 = -1 ..... etc until gen var24 = -1 * the reason for the value -1 is that we wish to know if the variable is changed later.
Sort id visit * now we can move the values to the correct variable: (shown for the var11..var14) if visit = 1 then var11 = var1 if visit = 1 then var12 = var1[_n+1] if visit = 1 then var13 = var1[_n+2] if visit = 1 then var14 = var1[_n+3] * and similarly for the var2x variables browse // to see visually that it works
* all values have now been copied from four records to the first record for each person select visit = 1 labeldata "My combined file in wide format" savedata newfile
The principle above can be used for many conversions. E.q. for first and follow up visit: * assume bmi indicates body mass index. And we wish to test for bmi change by sex: gen bmidif = . if (visit = 1) and (id = id[_n+1]) then bmidif = bmi[_n+1] - bmi means bmidif sex // for testing in version 2 add /t
The next question could be "How to get from wide to long format" - The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400 // an empty file first, e.g. 100 observations with four records gen i s = _n // instead of _n recnumber can be used gen i id = 1 if (s mod 4) = 0 and (recnumber > 4) then id = id[_n-4] + 1 * plus some more logic to get the id numbers right. * then the data could be added by "relate" and subsequently moved to the correct variables
regards
Jens Lauritsen
This is an elegant solution. Is there a way to make sure that the data set is sorted to be sure that "* We assume that the order of visits in set12 is chronological, but wish to have an identifier:" holds true cheers marcel
*********** Marcel Zwahlen, PhD Department of Social and Preventive Medicine University Berne
-----Ursprüngliche Nachricht----- Von: epidata-list@lists.umanitoba.ca [mailto:epidata-list@lists.umanitoba.ca] Gesendet: Sonntag, 29. April 2007 09:12 An: epidata-list@lists.umanitoba.ca Betreff: [EpiData-list] Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion
Hans wishes to get from SET12.REC with variables: ID AGE SEX VAR1 VAR2 - where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13 VAR23 VAR14 VAR24 where each person (ID) have one record
This conversion can be named "From long to wide format" and is a conversion we often wish to work on: In the example it is quite easy, since it seems that there are exactly four visits per person.
One way of doing this is: (principle must be tested on actual data) read set12.rec freq id // here I would look to make sure that all id's actually had four records
* We assume that the order of visits in set12 is chronological, but wish to have an identifier: gen i visit = 1 // integer variable with value 1 for all if id = id[_n-1] then visit = visit[_n-1] + 1 // now the value of visit is correct 1 2 3 4 * the [_n] indicates recnumber and by subtracting one "_n-1" we get the previous record browse // here I would look to control that the calculation was correct
* generate the 8 variables: (you could also use define followed by let ) gen var11 = -1 ..... etc until gen var24 = -1 * the reason for the value -1 is that we wish to know if the variable is changed later.
Sort id visit * now we can move the values to the correct variable: (shown for the var11..var14) if visit = 1 then var11 = var1 if visit = 1 then var12 = var1[_n+1] if visit = 1 then var13 = var1[_n+2] if visit = 1 then var14 = var1[_n+3] * and similarly for the var2x variables browse // to see visually that it works
* all values have now been copied from four records to the first record for each person select visit = 1 labeldata "My combined file in wide format" savedata newfile
The principle above can be used for many conversions. E.q. for first and follow up visit: * assume bmi indicates body mass index. And we wish to test for bmi change by sex: gen bmidif = . if (visit = 1) and (id = id[_n+1]) then bmidif = bmi[_n+1] - bmi means bmidif sex // for testing in version 2 add /t
The next question could be "How to get from wide to long format" - The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400 // an empty file first, e.g. 100 observations with four records gen i s = _n // instead of _n recnumber can be used gen i id = 1 if (s mod 4) = 0 and (recnumber > 4) then id = id[_n-4] + 1 * plus some more logic to get the id numbers right. * then the data could be added by "relate" and subsequently moved to the correct variables
regards
Jens Lauritsen
Dear Jens:
As Marcel says, this is a very elegant solution, and I greatly appreciate your help. I just went through the entire exercise that had made me ask this in the first place:
1. From Excel with several rows per individual (Long) to two related EpiData files (PARENT and CHILD) 2. Making a single EpiData file from PARENT and CHILD 3. Making a Long-to-wide from the above.
All works fine and perfect. There is no problem in the order, Marcel, it works, however you want it.
Thanks again a lot, Jens. This is a common problem and a wonderful solution.
Hans
epidata-list@lists.umanitoba.ca wrote:
This is an elegant solution. Is there a way to make sure that the data set is sorted to be sure that "* We assume that the order of visits in set12 is chronological, but wish to have an identifier:" holds true cheers marcel
Marcel Zwahlen, PhD Department of Social and Preventive Medicine University Berne
Getting values from other records (was: createreduced dataset with EpiData Analysis): long to wide conversion
Hans wishes to get from SET12.REC with variables: ID AGE SEX VAR1 VAR2
- where each person (ID) have four records which we each call one visit to NEWSET.REC with variables: ID AGE SEX VAR11 VAR21 VAR12 VAR22 VAR13
VAR23 VAR14 VAR24 where each person (ID) have one record
This conversion can be named "From long to wide format" and is a conversion we often wish to work on: In the example it is quite easy, since it seems that there are exactly four visits per person.
One way of doing this is: (principle must be tested on actual data) read set12.rec freq id // here I would look to make sure that all id's actually had four records
- We assume that the order of visits in set12 is chronological, but wish to have an identifier:
gen i visit = // integer variable with value 1 for all if id =d[_n-1] then visit = visit[_n-1] + 1 // now the value of visit is correct 1 2 3 4
- the [_n] indicates recnumber and by subtracting one "_n-1" we get the previous record
browse // here I would look to control that the calculation was correct
- generate the 8 variables: (you could also use define followed by let ) gen var11 =1 ..... etc until gen var24 = -1
- the reason for the value -1 is that we wish to know if the variable is changed later.
Sort id visit
- now we can move the values to the correct variable: (shown for the
var11..var14) if visit = then var11 = var1 if visit = then var12 = var1[_n+1] if visit = then var13 = var1[_n+2] if visit = then var14 = var1[_n+3]
- and similarly for the var2x variables
browse // to see visually that it works
- all values have now been copied from four records to the first record for each person select visit = labeldata "My combined file in wide format"
savedata newfile
The principle above can be used for many conversions. E.q. for first and follow up visit:
- assume bmi indicates body mass index. And we wish to test for bmi change by sex:
gen bmidif = if (visit =) and (id = id[_n+1]) then bmidif = bmi[_n+1] - bmi means bmidif sex // for testing in version 2 add /t
The next question could be "How to get from wide to long format" - The reverse of what Hans wanted. This would not be quite as easy, but something like generate 400 // an empty file first, e.g. 100 observations with four records gen i s =n // instead of _n recnumber can be used gen i id = if (s mod 4) = and (recnumber > 4) then id = id[_n-4] + 1
- plus some more logic to get the id numbers right.
- then the data could be added by "relate" and subsequently moved to the correct variables
regards
Jens Lauritsen
participants (1)
-
epidata-list@lists.umanitoba.ca