[EpiData-list] Epidata XML to R: how to handle labels? - and missing values

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Tue Jun 14 03:49:01 CDT 2011


Quick reply: yes, strings are easy, converting to factors, it's the other data types that need some consideration. Regarding using the stata export, I did at one point wonder if what I was doing was completely unecessary and that using the stata export would be sufficient, but I think that having something that is native to R provides more possibilities.

I'm proceeding with the idea of reporting the labels and providing functions to ease working with them. I'm nearly there ...

David
--


On Mon, Jun 13, 2011 at 02:11:36PM +0200, epidata-list at lists.umanitoba.ca wrote:
> I think the answer lies in this:
> http://www.statmethods.net/input/valuelabels.html
> 
> Which by the way is a very good source for solutions in R.
> 
> Partial cut from that is:
> 
> ---
> To understand value labels in *R*, you need to understand the data 
> structure factor <http://www.statmethods.net/input/datatypes.html>.
> 
> You can use the factor function to create your own value lables.
> 
> |# variable v1 is coded 1, 2 or 3
> # we want to attach value labels 1=red, 2=blue, 3=green
> 
> mydata$v1 <- factor(mydata$v1,
> levels = c(1,2,3),
> labels = c("red", "blue", "green"))
> |
> 
> |So I think the answer to use factor principles in R.
> 
> What might have to be handled differently are floats, which seldom would 
> be needed (but can).
> 
> ---
> 
> |
> One alternative to use the xml directly would be to export to Stata 
> format with Manager and then use in R
> |# input Stata file
> library(foreign)
> mydata <- read.dta("c:/mydata.dta") |
> 
> Regarding Missing values:
> What I would do is to compare the direct (simplest) way with the XML 
> with the "import" from Stata and then test the behaviour of missing 
> values. But do  be aware that currently there are some problems with 
> import/export in Manager to and from Stata format with missing values.
> 
> Stata has many missing values (.a to .z) and most likely the import to R 
> from a Stata file have solved that.
> 
> regards
> Jens Lauritsen
> EpiData Association
> 
David Whiting, PhD | Senior Epidemiology  & Public Health Specialist
tel +32-2-6437945 | mob +32-496-266436 | David.Whiting at idf.org

International Diabetes Federation
166 Chaussée de La Hulpe, B-1170 Brussels, Belgium
tel +32-2-5385511 | fax +32-2-5385114
info at idf.org | www.idf.org | VAT BE 0433.674.528

IDF | Promoting diabetes care, prevention and a cure worldwide

_______________________________________________
> EpiData-list mailing list
> EpiData-list at lists.umanitoba.ca
> http://lists.umanitoba.ca/mailman/listinfo/epidata-list


More information about the EpiData-list mailing list