On Sat, Jun 11, 2011 at 08:22:48AM +0200, epidata-list@lists.umanitoba.ca wrote:
Maybe it would be nice to add that implementation to Manager, also. In that way other users of operative systems like Windows (GNU/Linux, and Mac) can be benefit.
I am assuming that Epidata analysis will eventually become available to Linux and Mac users, but it would still be useful to allow direct access to the Epidata files from R. If Epdata used SQLite as a backend, you would be able to access the data directly from R using SQL queries and not need to export/import anything.
This was what I was hoping someone knew if this could be done with the XML database.
Graham
AFAIK Epidata is using the XML file as the backend. So to do what you suggest we would need a way of exporting the XML to SQLite. Then it could be accessed from within R. This route would not use the R XML package. Doing this would certainly be possible (but I wouldn't know how to). However, this still means doing an export.
I have started working on an R package that imports an epidata XML file directly into R using the R XML package. So far it creates a dataframe in R and uses the field information to convert the data to appropriate R data types. I haven't started working with the value labels yet.
One potential disadvantage of this route is that the R XML approach reads the whole XML file into memory (I think) and this could create problems with large files on machines with limited resources. However, as a test I have created an XML file with 6 columns and 16,000 records and a netbook with 1Gb of memory reads it into R in 7 seconds---so, not exactly instantaneous, but then any export is also going to take a few seconds. I haven't tried to optimize it yet, so it might be possible to improve this.
It actually creates an R list object with a list of dataframes (because an epidata file may potentially contain multiple tables). The names of the dataframes come from the epidata XML file. In R it is used like this:
library(epidata) x <- read.epidata.xml("myepidatafile.epx")
names(x) [1] "datafile_id_0"
names(x$datafile_id_0) [1] "Name" "age" "height" "dob" "uppercasetest" "st"
If you know that there is only one table in the XML file you could do this so that the dataframe is stored in the object x:
library(epidata) x <- read.epidata.xml("myepidatafile.epx")[[1]]
names(x) [1] "Name" "age" "height" "dob" "uppercasetest" "st"
David --