[EpiData-list] Reading the new XML files into R

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Sat Jun 11 08:44:38 CDT 2011


On Sat, Jun 11, 2011 at 10:53:08AM +0200, epidata-list at lists.umanitoba.ca wrote:
 
> > I have started working on an R package that imports an epidata XML file
> > directly into R using the R XML package. So far it creates a dataframe in R
> > and uses the field information to convert the data to appropriate R data
> > types. I haven't started working with the value labels yet.

[...]

>  This sounds excellent, and I would be happy to to give it a trial, once you
> get to the stage of sharing,  if that would be useful.

OK, here you go. I've put the code on github: 
https://github.com/daudi/Epidata-XML-to-R

It's not a proper package yet, just some functions in a single file for now. And it probably isn't particularly great code (I'm still learning how to use XML), but it works. I've got some code in there for logging and debugging that can come out later (i.e. save(), status.log()). git clone it or just download the R file.

Some TODOs:

i) handle value labels;

ii) map the remaining data types;

iii) tighten up the code, e.g. replace for loops;

iv) deal with records of different lengths. 


The last one is an issue that needs some thinking about. If all rows/records have the same number of columns it works okay. But if you create a screen, enter some data, then add a new field and only enter data in the subsequent records then the first set of records will have fewer fields than the second set of records. R will recycle values and issue a warning. Detecting this and then dealing with it will mean changing the function that gets the records and I need to ponder this one. 

David
--


More information about the EpiData-list mailing list