Interesting to see how the "read into R" is progressing in latest days. Just a few comments:
Basicly any external system reading xml files should read at least 1-3 of: 1. The data file structure (type and number of fields). 2. The contained data 3. The metadata - that is defined value labels, defined missing variables (now contained as part of the value labels), the questions (or variable labels). But possibly also depending on purpose: 4. System data, such as defined delimeters for decimals and dates (written in the header section of the xml). 5. Project information
Mark Myatt active in the early EpiData development has written an introduction to R, including data examples. You find this on: http://www.brixtonhealth.com/Rex.zip
Regarding the specific discussions lately here: a. David writes:
iv) deal with records of different lengths.
The reason we have decided to only write variables containing data in the xml structure is that this makes data files much smaller. However I think we could consider whether this can be defined by the user, such that all fields are written in each record, regardless of whether they contain data or not.
b Once you are done with the script/principles, please write up in a document that we can either link to or save in the wiki under examples.
regards Jens Lauritsen EpiData Association