David,
Many thanks for this, I have imported my tiny sample file, but don't seem able to do anything with it, is this related to you saying you haven't started working with value labels.
Here is some output from R
x <- read.epidata.xml("test.epx", dec.sep = ".")> x$datafile_id_0 Date Species Count st 1 2009-11-23 Blackbird 34 0 2 2010-11-23 Thrush 57 0 3 2011-12-24 Blackbird 130 0 4 2006-11-23 Blackbird 134 0 5 2011-06-23 Thrush 34 0 6 2005-05-23 Sparrow 24 0
summary(x) Length Class Mode
datafile_id_0 4 data.frame list> boxplot(x$Count)Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) : need finite 'ylim' valuesIn addition: Warning messages:1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'4: In min(x) : no non-missing arguments to min; returning Inf5: In max(x) : no non-missing arguments to max; returning -Inf
I'm also not sure what the st column is.
You will gather I am right on the edge of my undertstanding of R here.
Graham
On 11 June 2011 14:44, epidata-list@lists.umanitoba.ca wrote:
On Sat, Jun 11, 2011 at 10:53:08AM +0200, epidata-list@lists.umanitoba.cawrote:
I have started working on an R package that imports an epidata XML file directly into R using the R XML package. So far it creates a dataframe
in R
and uses the field information to convert the data to appropriate R
data
types. I haven't started working with the value labels yet.
[...]
This sounds excellent, and I would be happy to to give it a trial, once
you
get to the stage of sharing, if that would be useful.
OK, here you go. I've put the code on github: https://github.com/daudi/Epidata-XML-to-R
It's not a proper package yet, just some functions in a single file for now. And it probably isn't particularly great code (I'm still learning how to use XML), but it works. I've got some code in there for logging and debugging that can come out later (i.e. save(), status.log()). git clone it or just download the R file.
Some TODOs:
i) handle value labels;
ii) map the remaining data types;
iii) tighten up the code, e.g. replace for loops;
iv) deal with records of different lengths.
The last one is an issue that needs some thinking about. If all rows/records have the same number of columns it works okay. But if you create a screen, enter some data, then add a new field and only enter data in the subsequent records then the first set of records will have fewer fields than the second set of records. R will recycle values and issue a warning. Detecting this and then dealing with it will mean changing the function that gets the records and I need to ponder this one.
David
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list