Re: [EpiData-list] Reading the new XML files into R

11 Jun 2011


      David
...
AFAIK Epidata is using the XML file as the backend. So to do what you
suggest we would need a way of exporting the XML to SQLite. Then it could be
accessed from within R. This route would not use the R XML package.
Doing this would certainly be possible (but I wouldn't know how to).
However, this still means doing an export.
No, sorry, I wasn't suggesting exporting to SQLite, I was just saying  that
with an SQLite backend to a database you can query the database directly (ie
without loading it all into R) and asking if there was a way to query the
XML database in the same way. If you go back to my OP that started this
thread that was what I was asking , but it then developed into a question
about running R code from Epidata analysis.
...
I have started working on an R package that imports an epidata XML file
directly into R using the R XML package. So far it creates a dataframe in R
and uses the field information to convert the data to appropriate R data
types. I haven't started working with the value labels yet.
This would be great and I look forward to it being available.
...
One potential disadvantage of this route is that the R XML approach reads
the whole XML file into memory (I think) and this could create problems with
large files on machines with limited resources. However, as a test I have
created an XML file with 6 columns and 16,000 records and a netbook with 1Gb
of memory reads it into R in 7 seconds---so, not exactly instantaneous, but
then any export is also going to take a few seconds. I haven't tried to
optimize it yet, so it might be possible to improve this.
I suspect that unless  there  is some way of querying the XML file directly,
then this is something we have to live with, there is an R package to use
SQL query language on txt files, but not sure if that is before or after
they are loaded into R however  I can't find it now.
Personally, I feel the advantages of loading the XML file directly and not
needing to convert it, outweighs the speed issue, certainly at the sample
sizes I would be looking at.  but it would be more of an issue for large
studies.
...
It actually creates an R list object with a list of dataframes (because an
epidata file may potentially contain multiple tables).
The names of the dataframes come from the epidata XML file.
...
In R it is used like this:
library(epidata)
x <- read.epidata.xml("myepidatafile.epx")
names(x)
[1] "datafile_id_0"
names(x$datafile_id_0)
[1] "Name"      "age"       "height"    "dob"       "uppercasetest" "st"
If you know that there is only one table in the XML file you could do this
so that the dataframe is stored in the object x:
library(epidata)
x <- read.epidata.xml("myepidatafile.epx")[[1]]
names(x)
[1] "Name"      "age"       "height"    "dob"       "uppercasetest" "st"
This sounds excellent, and I would be happy to to give it a trial, once you
get to the stage of sharing,  if that would be useful.
Having said that, at the moment I can't even get XML to install.  Some issue
with the xml2-config file not being available, but that is just on my main
computer running Ubuntu, it seems fine on my MacBook.
Thanks,
Graham

Re: [EpiData-list] Reading the new XML files into R

epidata-list＠lists.umanitoba.ca