
Hello EpiData (Dungeons & Dragons :-D) Masters,
I did read this post but after a second reply the messages get a little confused. So please forgive me if Im a little repetitive.
I see at least three types of labels:
1. Data label 2. Variable label 3. Value label
Im not aware of any data label function in R.
In R, the variable label is the variable name, which is the charater vector on the top of the columns. So, if you have a dataset on object *data* and type
labels(data) # It will returns the columns and rows names colnames(data) # will return the same columns names as labels.
Value labels does not make sense in R, because labels are a discription of numeric values which represent categories. Thus, these numbers are not really numbers, rather categories. Most importing functions transforms these into factors, which will make sense in most analysis functions.
What are value labels for? If you run a table analysis the labels comes in place of 1 or 2 or 99 or whatever values the labels should represent. R does it through the factors and their levels.
What are the variable labels for? If you run a table analysis the labels comes in marginals in place of the variables names (in this case as dimnames()) that could be something meaningless such as v22. However the table and other functions in R do not recognize labels wherever they are. Thus importing variable labels would be of little use.
Im awere of two packages that have importing functions that bring together with the data the variable labels and transform the value labels into factors, and uses these variables labels at the analysis functions automatically. epicalc and rms. epicalc does through the label.var()functionand Im not really sure where the label is stored. rms (or perhaps Hmisc... not sure) package uses the label() function - not the labels() function - and store the labels as attributes of each variable. If you type for example
attr(data$v22) # the label should appear as one of the variable attributes
such as 'Age' or 'Sex'.
Both packages uses these functions throughout all analysis functions and have functions to print a map of labels such as describe() and desc(). Describe is more rich and also returns some summaries of the variables. rms also has a datadist() function which is able to store at the R option() the distributions of variables values, which may be useful to set potential values in analysis tha your data is not representing (such as a bar plot that should range from 10 to 80 but your lowest and highest values are 12 and 68).
But see, these variable labels will only work on analysis functions of epicalc if I use the importing function from epicalc - use() - and for rms if I use the rms importing functions - stata.get(), spss.get(), sas.get(). If I import with stata.get for example and run analysis with any other package than rms, the labels are ignored. :-(
At the moment, I cant see how these can be improved unless a package EpiData comes along with analysis functions that automatically call the variable labels on tables and graphs.
If variables labels are to be imported then storing them as variable attributes seems to be reasonable, instead of storing them at a second column in the same dataframe. But the way I see, the next step (analysis functions to be used) is the one that should guide how labels should be handled.
Kind regards,
Abraço forte e que a força esteja com você,
Dr. Pedro Emmanuel A. A. do Brasil Instituto de Pesquisa Clínica Evandro Chagas Fundação Oswaldo Cruz Rio de Janeiro - Brasil Av. Brasil 4365 Tel 55 21 3865-9648 email: pedro.brasil@ipec.fiocruz.br email: emmanuel.brasil@gmail.com
---Apoio aos softwares livres www.zotero.org - gerenciamento de referências bibliográficas. www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações. www.epidata.dk - entrada de dados. www.r-project.org - análise de dados. www.ubuntu.com - sistema operacional