Jens,
Thanks for the guidance.
On Sun, Jun 12, 2011 at 04:00:20PM +0200, epidata-list@lists.umanitoba.ca wrote:
Interesting to see how the "read into R" is progressing in latest days. Just a few comments:
Basicly any external system reading xml files should read at least 1-3 of:
- The data file structure (type and number of fields).
I've added an implementation of this. The list object now has a field info table.
- The contained data
Yep.
- The metadata - that is defined value labels, defined missing
variables (now contained as part of the value labels), the questions (or variable labels).
Still need to do this.
But possibly also depending on purpose: 4. System data, such as defined delimeters for decimals and dates (written in the header section of the xml).
Agreed. Will be easy.
- Project information
Agreed. Will be easy.
Mark Myatt active in the early EpiData development has written an introduction to R, including data examples. You find this on: http://www.brixtonhealth.com/Rex.zip
Regarding the specific discussions lately here: a. David writes:
iv) deal with records of different lengths.
The reason we have decided to only write variables containing data in the xml structure is that this makes data files much smaller. However I think we could consider whether this can be defined by the user, such that all fields are written in each record, regardless of whether they contain data or not.
This makes sense. Anyway, I've changed the code to work around this issue, adding NA where the fields are not in the attributes. One side effect is that the field order may be changed as I need to sort them to ensure that they are all in the same order.
b Once you are done with the script/principles, please write up in a document that we can either link to or save in the wiki under examples.
Will do. It is all very experimental at the moment. Once I feel it has settled down I'll do this.
David --