On Tue, Jun 14, 2011 at 07:49:36PM +0200, epidata-list@lists.umanitoba.ca wrote:
When you create value labels in EpiData, you are also restricting the values that a field can hold. So value labels never apply to a continuous variable like systolic blood pressure. Value labels are exclusively for coded values.
OK, now I get it. I've just added some code to help deal with labels. Not completed yet, but now when an epx file is read in the labels are also added to the object. At the moment the object contains the data, a summary of the table structure and a summary of the labels, e.g.:
x <- read.epidata.xml("sample.epx", dec.sep = ".")
names(x)
[1] "datafile_id_0" "field.info" "labels"
x$labels
$`Integet Set` $`Integet Set`$type [1] "1"
$`Integet Set`$labels value order label missing 1 10 1 Value A FALSE 2 20 2 Value B FALSE 3 30 3 Value C FALSE 4 99 4 A missing TRUE
$`Float Set` $`Float Set`$type [1] "3"
$`Float Set`$labels value order label missing 1 1,11 1 Float value A FALSE 2 2,22 2 Float value B FALSE 3 3,33 3 Float value C FALSE 4 8,88 4 Second last missing TRUE 5 9,99 5 Float missing TRUE
$`String Set` $`String Set`$type [1] "12"
$`String Set`$labels value order label missing 1 AAA 1 First string label FALSE 2 BBB 2 Second string label FALSE 3 CCC 3 Third string label FALSE 4 ZZZ 4 Missing string label TRUE
x$field.info
id name type length decimals question 1 f19 VLAST 1 2 0 Last field... 2 f22 M3 1 5 0 The No-Enter field 3 f23 M2 1 4 0 Confirm-entry field 4 f24 M1 1 2 0 Just a field 5 f14 A1 2 4 0 Auto Incrementing (4 digits) 6 f9 A2 7 10 0 Date Field Auto (dmy) 7 f10 A3 8 10 0 Date Field Auto (mdy) 8 f11 A4 9 10 0 Date Field Auto (ymd) 9 f13 A5 11 8 0 Time Field Auto 10 f0 VL1 1 2 0 Integer Field (2 digits) 11 f2 VL2 3 4 2 Float Field (1.2 digits) 12 f16 VL3 12 3 0 String Field (3 characters) 13 f8 S3 13 40 0 Sample text 14 f4 S2 12 40 0 Sample text 15 f15 S1 12 20 0 Language 16 f1 I1 1 8 0 Integer Field (8 digits) 17 f3 F1 3 9 4 Float Field (4.4 digits) 18 f17 I2 1 3 0 Integer field (3 digits) 19 f5 D1 4 10 0 Date Field (DD/MM/YYYY) 20 f6 D2 5 10 0 Date Field (MM/DD/YYYY) 21 f12 T1 10 8 0 Time Field (HH:MM:SS) 22 f7 D3 6 10 0 Date Field (YYYY/MM/DD) 23 f18 J1 1 2 0 Jump field 24 f20 J2 1 2 0 With valuelabel 25 f21 J3 3 4 2 Float with second max 26 f26 O2 1 3 0 Default value 27 f25 O1 1 3 0 Repeat Value 28 f27 O3 1 3 0 Compare field 29 f32 O4 1 2 0 Notes field 30 f28 C1 1 2 0 Day 31 f29 C2 1 2 0 Month 32 f30 C3 1 4 0 Year 33 f31 C4 6 10 0 Result Date Field (YMD)
summary(x$datafile_id_0)
VL1 I1 A3 A4 Min. :10.00 Min. : 42 01/05/2011:12 2011/01/05:12 1st Qu.:10.00 1st Qu.: 928037 Median :25.00 Median :29012340 Mean :44.67 Mean :36010019 3rd Qu.:99.00 3rd Qu.:62317068 Max. :99.00 Max. :90123456
T1 A5 A1 S1 VL3 01.00.00:1 14.22.55:1 Min. :100.0 ภาษาไทย :1 AAA:3 02.00.00:1 14.25.36:1 1st Qu.:102.8 Dansk :1 BBB:1 03.00.00:1 14.26.38:1 Median :105.5 English :1 CCC:4 04.00.00:1 14.27.40:1 Mean :105.5 Español :1 ZZZ:4 05.00.00:1 14.28.23:1 3rd Qu.:108.2 Français:1 06.00.00:1 14.30.54:1 Max. :111.0 Íslenska:1 (Other) :6 (Other) :6 (Other) :6 I2 J1 VLAST VL2 Min. : 50.00 Min. :0.000 Min. : 1.000 Min. :1.110 1st Qu.: 52.75 1st Qu.:0.000 1st Qu.: 3.500 1st Qu.:1.110 Median : 55.50 Median :1.500 Median : 7.000 Median :2.775 Mean : 76.25 Mean :1.583 Mean : 6.636 Mean :4.625 3rd Qu.: 58.25 3rd Qu.:3.000 3rd Qu.: 9.500 3rd Qu.:9.990 Max. :250.00 Max. :4.000 Max. :12.000 Max. :9.990 NA's : 1.000 J2 J3 M3 M2 M1 Min. :10.0 Min. :1.110 Min. : NA Min. : 2.0 Min. : 1.000 1st Qu.:10.0 1st Qu.:6.938 1st Qu.: NA 1st Qu.: 6.0 1st Qu.: 3.000 Median :54.5 Median :8.880 Median : NA Median : 8.0 Median : 7.000 Mean :54.5 Mean :7.354 Mean :NaN Mean : 143.6 Mean : 6.556 3rd Qu.:99.0 3rd Qu.:9.990 3rd Qu.: NA 3rd Qu.: 11.0 3rd Qu.: 9.000 Max. :99.0 Max. :9.990 Max. : NA Max. :1234.0 Max. :12.000 NA's : 6.0 NA's :4.000 NA's : 12 NA's : 3.0 NA's : 3.000 O1 O2 O3 C1 Min. :2.000 Min. : 1.00 Min. : 2.00 Min. : 1.00 1st Qu.:3.000 1st Qu.: 4.25 1st Qu.: 5.25 1st Qu.: 3.75 Median :4.500 Median : 5.50 Median : 6.50 Median : 6.50 Mean :4.833 Mean : 13.25 Mean : 15.08 Mean : 8.75 3rd Qu.:6.250 3rd Qu.: 8.25 3rd Qu.: 9.25 3rd Qu.:10.00 Max. :9.000 Max. :100.00 Max. :111.00 Max. :25.00
C2 F1 C3 C4 Min. : 1.000 Min. : 25.02 Min. : 3.00 1920/02/02:1 1st Qu.: 3.000 1st Qu.: 113.71 1st Qu.: 4.75 2003/03/03:1 Median : 4.500 Median :2901.23 Median : 7.50 2003/03/22:1 Mean : 5.333 Mean :3613.54 Mean : 174.92 2004/04/04:1 3rd Qu.: 7.250 3rd Qu.:6231.71 3rd Qu.: 11.00 2005/05/05:1 Max. :12.000 Max. :9012.35 Max. :2012.00 2006/06/06:1 (Other) :6 O4 S2 D1 Min. :0.00 นี้คือข้อความตัวอย่าง :1 Min. :2001-01-01 1st Qu.:1.00 Đây là một văn bản mẫu :1 1st Qu.:2003-12-26 Median :1.50 Dette er en prøve tekst:1 Median :2007-07-08 Mean :1.75 échantillon de texte :1 Mean :2007-01-21 3rd Qu.:2.25 ejemplo de un texto :1 3rd Qu.:2010-09-10 Max. :4.00 This is a sample text :1 Max. :2011-09-11 (Other) :6 D2 D3 S3 A2 01/01/2001:1 1912/09/10:1 สำหรับไฟล์ :1 05/01/2011:12 01/13/2011:1 2001/01/01:1 Đây là một văn bản mẫu :1 02/02/2002:1 2002/02/02:1 ÐETTA ER A SÝNISHORN :1 03/03/2003:1 2003/03/03:1 DETTE ER EN PRØVE TEKST:1 04/04/2004:1 2004/04/04:1 ÉCHANTILLON DE TEXTE :1 05/05/2005:1 2005/05/05:1 EJEMPLO DE UN TEXTO :1 (Other) :6 (Other) :6 (Other) :6 st 0:12
David --
Jamie
On 2011-06-14, at 10:30 AM, epidata-list@lists.umanitoba.ca wrote:
It seems from the sample.epx file that any data type can have labels. The snippet I included was from the sample file and was for numeric values. So, it seems you could have a field for systolic blood pressure and a label for 160 that says "a bit too high". You can't do this in R (except by using attr() but that has limitations).
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list