Jamie,
On Tue, Jun 14, 2011 at 03:58:10PM +0200, epidata-list@lists.umanitoba.ca wrote:
Great work on this so far, David. This will be useful, because you can get more of the metadata into R directly. I'm not sure what you mean by "other data types". As I see it, if an EpiData field uses value labels, then that field (variable) should be a factor in R.
It seems from the sample.epx file that any data type can have labels. The snippet I included was from the sample file and was for numeric values. So, it seems you could have a field for systolic blood pressure and a label for 160 that says "a bit too high". You can't do this in R (except by using attr() but that has limitations).
Once I have the approach I'm playing with working I'll upload it and seek feedback.
Not well documented, because you cannot create them in EpiData Manager, are boolean fields. These are a holdover from the current version of EpiData and so far they persist in the new version if imported from old .rec files. FIeld type is 0 (zero) and valid values are Y, N or missing.
I don't think there was an answer to someone's question about the field ST. This is record status (0=active, 1=deleted). In the future, another code will be used for "verified". An import option would be to ignore deleted records (default for EpiData Analysis).
Yes, I asked about this. Thanks, this is clear.
Since date, time and decimal separators are metadata, the R function should read these from the XML and convert to whatever R requires. This should not require user choice.
Yes, I plan to use the metadata.
I'll be trying this out, even though I don't use R much anymore. We taught EpiData to a group last week, among which were several R users. They liked the simplicity of EpiData Analysis, but will be very happy with the direct import to R.
Please note that it is still a bit experimental, and some of the code will need cleaning up.
Thanks for your comments.
David --
Jamie Hockin Ottawa
Quick reply: yes, strings are easy, converting to factors, it's the other data types that need some consideration. Regarding using the stata export, I did at one point wonder if what I was doing was completely unecessary and that using the stata export would be sufficient, but I think that having something that is native to R provides more possibilities.
I'm proceeding with the idea of reporting the labels and providing functions to ease working with them. I'm nearly there ...
David
David Whiting, PhD | Senior Epidemiology & Public Health Specialist tel +32-2-6437945 | mob +32-496-266436 | David.Whiting@idf.org
International Diabetes Federation 166 Chaussée de La Hulpe, B-1170 Brussels, Belgium tel +32-2-5385511 | fax +32-2-5385114 info@idf.org | www.idf.org | VAT BE 0433.674.528
IDF | Promoting diabetes care, prevention and a cure worldwide
_______________________________________________
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list