[EpiData-list] Epidata XML to R: how to handle labels? - and missing values

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Tue Jun 14 09:30:20 CDT 2011


On Tue, Jun 14, 2011 at 03:58:10PM +0200, epidata-list at lists.umanitoba.ca wrote:
> Great work on this so far, David. This will be useful, because you can get more of the metadata into R directly.
> I'm not sure what you mean by "other data types". As I see it, if an EpiData field uses value labels, then that field (variable) should be a factor in R. 

It seems from the sample.epx file that any data type can have labels. The snippet I included was from the sample file and was for numeric values. So, it seems you could have a field for systolic blood pressure and a label for 160 that says "a bit too high".  You can't do this in R (except by using attr() but that has limitations). 

Once I have the approach I'm playing with working I'll upload it and seek feedback.

> Not well documented, because you cannot create them in EpiData Manager, are boolean fields. These are a holdover from the current version of EpiData and so far they persist in the new version if imported from old .rec files. FIeld type is 0 (zero) and valid values are Y, N or missing.
> I don't think there was an answer to someone's question about the field ST. This is record status (0=active, 1=deleted). In the future, another code will be used for "verified". An import option would be to ignore deleted records (default for EpiData Analysis).

Yes, I asked about this. Thanks, this is clear.

> Since date, time and decimal separators are metadata, the R function should read these from the XML and convert to whatever R requires. This should not require user choice.

Yes, I plan to use the metadata.

> I'll be trying this out, even though I don't use R much anymore. We taught EpiData to a group last week, among which were several R users. They liked the simplicity of EpiData Analysis, but will be very happy with the direct import to R.

Please note that it is still a bit experimental, and some of the code will need cleaning up. 

Thanks for your comments.


> Jamie Hockin
> Ottawa
> > Quick reply: yes, strings are easy, converting to factors, it's the other data types that need some consideration. Regarding using the stata export, I did at one point wonder if what I was doing was completely unecessary and that using the stata export would be sufficient, but I think that having something that is native to R provides more possibilities.
> > 
> > I'm proceeding with the idea of reporting the labels and providing functions to ease working with them. I'm nearly there ...
> > 
> > David
David Whiting, PhD | Senior Epidemiology  & Public Health Specialist
tel +32-2-6437945 | mob +32-496-266436 | David.Whiting at idf.org

International Diabetes Federation
166 Chaussée de La Hulpe, B-1170 Brussels, Belgium
tel +32-2-5385511 | fax +32-2-5385114
info at idf.org | www.idf.org | VAT BE 0433.674.528

IDF | Promoting diabetes care, prevention and a cure worldwide

> EpiData-list mailing list
> EpiData-list at lists.umanitoba.ca
> http://lists.umanitoba.ca/mailman/listinfo/epidata-list

More information about the EpiData-list mailing list