[EpiData-list] Value labels and variable labels in R

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Tue Jun 14 03:54:39 CDT 2011


Quick reply: yes, I agree it could be possible to hack something using attr() but as you hsay we would then need a suite of functions to work with this as they would not be used by the normal R functions. I've tried doing this sort of thing in the past, but never really found it to be satisfactory. 

David
--


On Mon, Jun 13, 2011 at 11:18:52PM +0200, epidata-list at lists.umanitoba.ca wrote:
> Hello EpiData (Dungeons & Dragons :-D) Masters,
> 
> I did read this post but after a second reply the messages get a little
> confused. So please forgive me if Im a little repetitive.
> 
> I see at least three types of labels:
> 
> 1. Data label
> 2. Variable label
> 3. Value label
> 
> Im not aware of any data label function in R.
> 
> In R, the variable label is the variable name, which is the charater vector
> on the top of the columns. So, if you have a dataset on object *data* and
> type
> 
> > labels(data) # It will returns the columns and rows names
> > colnames(data) # will return the same columns names as labels.
> 
> Value labels does not make sense in R, because labels are a discription of
> numeric values which represent categories. Thus, these numbers are not
> really numbers, rather categories. Most importing functions transforms these
> into factors, which will make sense in most analysis functions.
> 
> What are value labels for? If you run a table analysis the labels comes in
> place of 1 or 2 or 99 or whatever values the labels should represent. R does
> it through the factors and their levels.
> 
> What are the variable labels for? If you run a table analysis the labels
> comes in marginals in place of the variables names (in this case as
> dimnames()) that could be something meaningless such as v22. However the
> table and other functions in R do not recognize labels wherever they are.
> Thus importing variable labels would be of little use.
> 
> Im awere of two packages that have importing functions that bring together
> with the data the variable labels and transform the value labels into
> factors, and uses these variables labels at the analysis functions
> automatically. epicalc and rms. epicalc does through the
> label.var()functionand Im not really sure where the label is stored.
> rms (or perhaps Hmisc...
> not sure) package uses the label() function - not the labels() function -
> and store the labels as attributes of each variable. If you type for example
> 
> 
> > attr(data$v22) # the label should appear as one of the variable attributes
> such as 'Age' or 'Sex'.
> 
> Both packages uses these functions throughout all analysis functions and
> have functions to print a map of labels such as describe() and desc().
> Describe is more rich and also returns some summaries of the variables. rms
> also has a datadist() function which is able to store at the R option() the
> distributions of variables values, which may be useful to set potential
> values in analysis tha your data is not representing (such as a bar plot
> that should range from 10 to 80 but your lowest and highest values are 12
> and 68).
> 
> But see, these variable labels will only work on analysis functions of
> epicalc if I use the importing function from epicalc - use() - and for rms
> if I use the rms importing functions - stata.get(), spss.get(), sas.get().
> If I import with stata.get for example and run analysis with any other
> package than rms, the labels are ignored. :-(
> 
> At the moment, I cant see how these can be improved unless a package EpiData
> comes along with analysis functions that automatically call the variable
> labels on tables and graphs.
> 
> If variables labels are to be imported then storing them as variable
> attributes seems to be reasonable, instead of storing them at a second
> column in the same dataframe. But the way I see, the next step (analysis
> functions to be used) is the one that should guide how labels should be
> handled.
> 
> Kind regards,
> 
> Abraço forte e que a força esteja com você,
> 
> Dr. Pedro Emmanuel A. A. do Brasil
> Instituto de Pesquisa Clínica Evandro Chagas
> Fundação Oswaldo Cruz
> Rio de Janeiro - Brasil
> Av. Brasil 4365
> Tel 55 21 3865-9648
> email: pedro.brasil at ipec.fiocruz.br
> email: emmanuel.brasil at gmail.com
> 
> ---Apoio aos softwares livres
> www.zotero.org - gerenciamento de referências bibliográficas.
> www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações.
> www.epidata.dk - entrada de dados.
> www.r-project.org - análise de dados.
> www.ubuntu.com - sistema operacional
> 
David Whiting, PhD | Senior Epidemiology  & Public Health Specialist
tel +32-2-6437945 | mob +32-496-266436 | David.Whiting at idf.org

International Diabetes Federation
166 Chaussée de La Hulpe, B-1170 Brussels, Belgium
tel +32-2-5385511 | fax +32-2-5385114
info at idf.org | www.idf.org | VAT BE 0433.674.528

IDF | Promoting diabetes care, prevention and a cure worldwide

_______________________________________________
> EpiData-list mailing list
> EpiData-list at lists.umanitoba.ca
> http://lists.umanitoba.ca/mailman/listinfo/epidata-list


More information about the EpiData-list mailing list