[EpiData-list] Value labels and variable labels in R

13 Jun 2011


      Hello EpiData (Dungeons & Dragons :-D) Masters,
I did read this post but after a second reply the messages get a little
confused. So please forgive me if Im a little repetitive.
I see at least three types of labels:
1. Data label
2. Variable label
3. Value label
Im not aware of any data label function in R.
In R, the variable label is the variable name, which is the charater vector
on the top of the columns. So, if you have a dataset on object *data* and
type
...
labels(data) # It will returns the columns and rows names
colnames(data) # will return the same columns names as labels.
Value labels does not make sense in R, because labels are a discription of
numeric values which represent categories. Thus, these numbers are not
really numbers, rather categories. Most importing functions transforms these
into factors, which will make sense in most analysis functions.
What are value labels for? If you run a table analysis the labels comes in
place of 1 or 2 or 99 or whatever values the labels should represent. R does
it through the factors and their levels.
What are the variable labels for? If you run a table analysis the labels
comes in marginals in place of the variables names (in this case as
dimnames()) that could be something meaningless such as v22. However the
table and other functions in R do not recognize labels wherever they are.
Thus importing variable labels would be of little use.
Im awere of two packages that have importing functions that bring together
with the data the variable labels and transform the value labels into
factors, and uses these variables labels at the analysis functions
automatically. epicalc and rms. epicalc does through the
label.var()functionand Im not really sure where the label is stored.
rms (or perhaps Hmisc...
not sure) package uses the label() function - not the labels() function -
and store the labels as attributes of each variable. If you type for example
...
attr(data$v22) # the label should appear as one of the variable attributes
such as 'Age' or 'Sex'.
Both packages uses these functions throughout all analysis functions and
have functions to print a map of labels such as describe() and desc().
Describe is more rich and also returns some summaries of the variables. rms
also has a datadist() function which is able to store at the R option() the
distributions of variables values, which may be useful to set potential
values in analysis tha your data is not representing (such as a bar plot
that should range from 10 to 80 but your lowest and highest values are 12
and 68).
But see, these variable labels will only work on analysis functions of
epicalc if I use the importing function from epicalc - use() - and for rms
if I use the rms importing functions - stata.get(), spss.get(), sas.get().
If I import with stata.get for example and run analysis with any other
package than rms, the labels are ignored. :-(
At the moment, I cant see how these can be improved unless a package EpiData
comes along with analysis functions that automatically call the variable
labels on tables and graphs.
If variables labels are to be imported then storing them as variable
attributes seems to be reasonable, instead of storing them at a second
column in the same dataframe. But the way I see, the next step (analysis
functions to be used) is the one that should guide how labels should be
handled.
Kind regards,
Abraço forte e que a força esteja com você,
Dr. Pedro Emmanuel A. A. do Brasil
Instituto de Pesquisa Clínica Evandro Chagas
Fundação Oswaldo Cruz
Rio de Janeiro - Brasil
Av. Brasil 4365
Tel 55 21 3865-9648
email: pedro.brasil@ipec.fiocruz.br
email: emmanuel.brasil@gmail.com
---Apoio aos softwares livres
www.zotero.org - gerenciamento de referências bibliográficas.
www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações.
www.epidata.dk - entrada de dados.
www.r-project.org - análise de dados.
www.ubuntu.com - sistema operacional

[EpiData-list] Value labels and variable labels in R

epidata-list＠lists.umanitoba.ca