[EpiData-list] Epidata XML to R: how to handle labels? - and missing values

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Tue Jun 14 13:30:59 CDT 2011


On Tue, Jun 14, 2011 at 07:49:36PM +0200, epidata-list at lists.umanitoba.ca wrote:
> When you create value labels in EpiData, you are also restricting the values that a field can hold. So value labels never apply to a continuous variable like systolic blood pressure. Value labels are exclusively for coded values.

OK, now I get it. I've just added some code to help deal with labels. Not completed yet, but now when an epx file is read in the labels are also added to the object. At the moment the object contains the data, a summary of the table structure and a summary of the labels, e.g.:


> x <- read.epidata.xml("sample.epx", dec.sep = ".")

> names(x)
[1] "datafile_id_0" "field.info"    "labels"       

> x$labels
$`Integet Set`
$`Integet Set`$type
[1] "1"

$`Integet Set`$labels
  value order     label missing
1    10     1   Value A   FALSE
2    20     2   Value B   FALSE
3    30     3   Value C   FALSE
4    99     4 A missing    TRUE


$`Float Set`
$`Float Set`$type
[1] "3"

$`Float Set`$labels
  value order               label missing
1  1,11     1       Float value A   FALSE
2  2,22     2       Float value B   FALSE
3  3,33     3       Float value C   FALSE
4  8,88     4 Second last missing    TRUE
5  9,99     5       Float missing    TRUE


$`String Set`
$`String Set`$type
[1] "12"

$`String Set`$labels
  value order                label missing
1   AAA     1   First string label   FALSE
2   BBB     2  Second string label   FALSE
3   CCC     3   Third string label   FALSE
4   ZZZ     4 Missing string label    TRUE


> x$field.info
    id  name type length decimals                     question
1  f19 VLAST    1      2        0                Last field...
2  f22    M3    1      5        0           The No-Enter field
3  f23    M2    1      4        0          Confirm-entry field
4  f24    M1    1      2        0                 Just a field
5  f14    A1    2      4        0 Auto Incrementing (4 digits)
6   f9    A2    7     10        0        Date Field Auto (dmy)
7  f10    A3    8     10        0        Date Field Auto (mdy)
8  f11    A4    9     10        0        Date Field Auto (ymd)
9  f13    A5   11      8        0              Time Field Auto
10  f0   VL1    1      2        0     Integer Field (2 digits)
11  f2   VL2    3      4        2     Float Field (1.2 digits)
12 f16   VL3   12      3        0  String Field (3 characters)
13  f8    S3   13     40        0                  Sample text
14  f4    S2   12     40        0                  Sample text
15 f15    S1   12     20        0                     Language
16  f1    I1    1      8        0     Integer Field (8 digits)
17  f3    F1    3      9        4     Float Field (4.4 digits)
18 f17    I2    1      3        0     Integer field (3 digits)
19  f5    D1    4     10        0      Date Field (DD/MM/YYYY)
20  f6    D2    5     10        0      Date Field (MM/DD/YYYY)
21 f12    T1   10      8        0        Time Field (HH:MM:SS)
22  f7    D3    6     10        0      Date Field (YYYY/MM/DD)
23 f18    J1    1      2        0                   Jump field
24 f20    J2    1      2        0              With valuelabel
25 f21    J3    3      4        2        Float with second max
26 f26    O2    1      3        0                Default value
27 f25    O1    1      3        0                 Repeat Value
28 f27    O3    1      3        0                Compare field
29 f32    O4    1      2        0                  Notes field
30 f28    C1    1      2        0                          Day
31 f29    C2    1      2        0                        Month
32 f30    C3    1      4        0                         Year
33 f31    C4    6     10        0      Result Date Field (YMD)



> summary(x$datafile_id_0)
      VL1              I1                    A3              A4    
 Min.   :10.00   Min.   :      42   01/05/2011:12   2011/01/05:12  
 1st Qu.:10.00   1st Qu.:  928037                                  
 Median :25.00   Median :29012340                                  
 Mean   :44.67   Mean   :36010019                                  
 3rd Qu.:99.00   3rd Qu.:62317068                                  
 Max.   :99.00   Max.   :90123456                                  
                                                                   
        T1           A5          A1               S1     VL3   
 01.00.00:1   14.22.55:1   Min.   :100.0   ภาษาไทย :1   AAA:3  
 02.00.00:1   14.25.36:1   1st Qu.:102.8   Dansk   :1   BBB:1  
 03.00.00:1   14.26.38:1   Median :105.5   English :1   CCC:4  
 04.00.00:1   14.27.40:1   Mean   :105.5   Español :1   ZZZ:4  
 05.00.00:1   14.28.23:1   3rd Qu.:108.2   Français:1          
 06.00.00:1   14.30.54:1   Max.   :111.0   Íslenska:1          
 (Other) :6   (Other) :6                   (Other) :6          
       I2               J1            VLAST             VL2       
 Min.   : 50.00   Min.   :0.000   Min.   : 1.000   Min.   :1.110  
 1st Qu.: 52.75   1st Qu.:0.000   1st Qu.: 3.500   1st Qu.:1.110  
 Median : 55.50   Median :1.500   Median : 7.000   Median :2.775  
 Mean   : 76.25   Mean   :1.583   Mean   : 6.636   Mean   :4.625  
 3rd Qu.: 58.25   3rd Qu.:3.000   3rd Qu.: 9.500   3rd Qu.:9.990  
 Max.   :250.00   Max.   :4.000   Max.   :12.000   Max.   :9.990  
                                  NA's   : 1.000                  
       J2             J3              M3            M2               M1        
 Min.   :10.0   Min.   :1.110   Min.   : NA   Min.   :   2.0   Min.   : 1.000  
 1st Qu.:10.0   1st Qu.:6.938   1st Qu.: NA   1st Qu.:   6.0   1st Qu.: 3.000  
 Median :54.5   Median :8.880   Median : NA   Median :   8.0   Median : 7.000  
 Mean   :54.5   Mean   :7.354   Mean   :NaN   Mean   : 143.6   Mean   : 6.556  
 3rd Qu.:99.0   3rd Qu.:9.990   3rd Qu.: NA   3rd Qu.:  11.0   3rd Qu.: 9.000  
 Max.   :99.0   Max.   :9.990   Max.   : NA   Max.   :1234.0   Max.   :12.000  
 NA's   : 6.0   NA's   :4.000   NA's   : 12   NA's   :   3.0   NA's   : 3.000  
       O1              O2               O3               C1       
 Min.   :2.000   Min.   :  1.00   Min.   :  2.00   Min.   : 1.00  
 1st Qu.:3.000   1st Qu.:  4.25   1st Qu.:  5.25   1st Qu.: 3.75  
 Median :4.500   Median :  5.50   Median :  6.50   Median : 6.50  
 Mean   :4.833   Mean   : 13.25   Mean   : 15.08   Mean   : 8.75  
 3rd Qu.:6.250   3rd Qu.:  8.25   3rd Qu.:  9.25   3rd Qu.:10.00  
 Max.   :9.000   Max.   :100.00   Max.   :111.00   Max.   :25.00  
                                                                  
       C2               F1                C3                   C4   
 Min.   : 1.000   Min.   :  25.02   Min.   :   3.00   1920/02/02:1  
 1st Qu.: 3.000   1st Qu.: 113.71   1st Qu.:   4.75   2003/03/03:1  
 Median : 4.500   Median :2901.23   Median :   7.50   2003/03/22:1  
 Mean   : 5.333   Mean   :3613.54   Mean   : 174.92   2004/04/04:1  
 3rd Qu.: 7.250   3rd Qu.:6231.71   3rd Qu.:  11.00   2005/05/05:1  
 Max.   :12.000   Max.   :9012.35   Max.   :2012.00   2006/06/06:1  
                                                      (Other)   :6  
       O4                                         S2       D1            
 Min.   :0.00   นี้คือข้อความตัวอย่าง              :1            Min.   :2001-01-01  
 1st Qu.:1.00   Đây là một văn bản mẫu :1            1st Qu.:2003-12-26  
 Median :1.50   Dette er en prøve tekst:1            Median :2007-07-08  
 Mean   :1.75   échantillon de texte   :1            Mean   :2007-01-21  
 3rd Qu.:2.25   ejemplo de un texto    :1            3rd Qu.:2010-09-10  
 Max.   :4.00   This is a sample text  :1            Max.   :2011-09-11  
                (Other)                :6                                
          D2             D3                              S3          A2    
 01/01/2001:1   1912/09/10:1   สำหรับไฟล์                 :1    05/01/2011:12  
 01/13/2011:1   2001/01/01:1   Đây là một văn bản mẫu :1                   
 02/02/2002:1   2002/02/02:1   ÐETTA ER A SÝNISHORN   :1                   
 03/03/2003:1   2003/03/03:1   DETTE ER EN PRØVE TEKST:1                   
 04/04/2004:1   2004/04/04:1   ÉCHANTILLON DE TEXTE   :1                   
 05/05/2005:1   2005/05/05:1   EJEMPLO DE UN TEXTO    :1                   
 (Other)   :6   (Other)   :6   (Other)                :6                   
 st    
 0:12  
       


David
--       
       

> 
> Jamie
> 
> On 2011-06-14, at 10:30 AM, epidata-list at lists.umanitoba.ca wrote:
> 
> > It seems from the sample.epx file that any data type can have labels. The snippet I included was from the sample file and was for numeric values. So, it seems you could have a field for systolic blood pressure and a label for 160 that says "a bit too high".  You can't do this in R (except by using attr() but that has limitations). 
> 
> _______________________________________________
> EpiData-list mailing list
> EpiData-list at lists.umanitoba.ca
> http://lists.umanitoba.ca/mailman/listinfo/epidata-list


More information about the EpiData-list mailing list