[EpiData-list] Reading the new XML files into R

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Sat Jun 11 03:53:08 CDT 2011


> AFAIK Epidata is using the XML file as the backend. So to do what you
> suggest we would need a way of exporting the XML to SQLite. Then it could be
> accessed from within R. This route would not use the R XML package.
> Doing this would certainly be possible (but I wouldn't know how to).
> However, this still means doing an export.

No, sorry, I wasn't suggesting exporting to SQLite, I was just saying  that
with an SQLite backend to a database you can query the database directly (ie
without loading it all into R) and asking if there was a way to query the
XML database in the same way. If you go back to my OP that started this
thread that was what I was asking , but it then developed into a question
about running R code from Epidata analysis.

> I have started working on an R package that imports an epidata XML file
> directly into R using the R XML package. So far it creates a dataframe in R
> and uses the field information to convert the data to appropriate R data
> types. I haven't started working with the value labels yet.

This would be great and I look forward to it being available.

> One potential disadvantage of this route is that the R XML approach reads
> the whole XML file into memory (I think) and this could create problems with
> large files on machines with limited resources. However, as a test I have
> created an XML file with 6 columns and 16,000 records and a netbook with 1Gb
> of memory reads it into R in 7 seconds---so, not exactly instantaneous, but
> then any export is also going to take a few seconds. I haven't tried to
> optimize it yet, so it might be possible to improve this.

I suspect that unless  there  is some way of querying the XML file directly,
then this is something we have to live with, there is an R package to use
SQL query language on txt files, but not sure if that is before or after
they are loaded into R however  I can't find it now.

Personally, I feel the advantages of loading the XML file directly and not
needing to convert it, outweighs the speed issue, certainly at the sample
sizes I would be looking at.  but it would be more of an issue for large

> It actually creates an R list object with a list of dataframes (because an
> epidata file may potentially contain multiple tables).

The names of the dataframes come from the epidata XML file.
> In R it is used like this:
> library(epidata)
> x <- read.epidata.xml("myepidatafile.epx")
> names(x)
> [1] "datafile_id_0"
> names(x$datafile_id_0)
> [1] "Name"      "age"       "height"    "dob"       "uppercasetest" "st"
> If you know that there is only one table in the XML file you could do this
> so that the dataframe is stored in the object x:
> library(epidata)
> x <- read.epidata.xml("myepidatafile.epx")[[1]]
> names(x)
> [1] "Name"      "age"       "height"    "dob"       "uppercasetest" "st"
This sounds excellent, and I would be happy to to give it a trial, once you
get to the stage of sharing,  if that would be useful.

Having said that, at the moment I can't even get XML to install.  Some issue
with the xml2-config file not being available, but that is just on my main
computer running Ubuntu, it seems fine on my MacBook.



More information about the EpiData-list mailing list