June 2011 - EpiData-list - University of Manitoba Mailing Lists

How to edit a .qes file imported from old EpiData version
by epidata-list＠lists.umanitoba.ca 29 Jun '11

29 Jun '11

Hi, I have imported a .qes file into EpiData Manager and re-saved it in the new format. The document is a questionnaire and I am trying to make updates to the fields. I have two questions... 1) Is it possible to insert a new text field in between two already established fields? 2) I've noticed that the label names no longer show up on the form; will this effect coding capabilities? Thanks for the assistance. -------------------------------------------------------------------------------------------------------- Jonathan E. Outbreak Management Division Centre for Food-borne, Environmental and Zoonotic Infectious Diseases Infectious Disease Prevention and Control Branch Public Health Agency of Canada

1 2

SV: [EpiData-list] It is possible to have labels that are inconsistent with previously entered data
by epidata-list＠lists.umanitoba.ca 18 Jun '11

18 Jun '11

Hi David It is correct that neither Manager nor EntryClient checks the datafile for consistency upon opening. We are thinking that this type of check will be available as a tool within Manager and/or Analysis. Since these are the programs used by the project maintainers. Please feel free to discuss if this is sufficient or another solution is better. Kind regards, Torsten Bonde Christiansen. Epidata Association. ----- Reply message ----- Fra: epidata-list(a)lists.umanitoba.ca Dato: fre., jun. 17, 2011 19:45 Emne: [EpiData-list] It is possible to have labels that are inconsistent with previously entered data Til: <epidata-list(a)lists.umanitoba.ca> Hi, Using the new data manager and client it is possible to create a data entry system, enter data, then add labels that don't match the labels. E.g. enter "dogs", "cats", "cows" then add labels for that field that only has "dogs" and "cats". This results in the labels defined in the XML file not matching the previously entered data. One solution is "don't do that!" but at the moment it is too easy to do so, possibly accidentally. I've certainly seen situations where a data entry system has evolved. Removing a set of labels removes the link between the labels and the field (and warns before doing so) and I wonder if epidata should also warn when creating this inconsistency. It could potentially mean a lot of checking if the data file is large and the set of labels is used for many fields. I wouldn't want the system to over-write any data, but perhaps have a way of reporting which fields and values are inconsistent with the label set. David -- _______________________________________________ EpiData-list mailing list EpiData-list(a)lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list

1 2

It is possible to have labels that are inconsistent with previously entered data
by epidata-list＠lists.umanitoba.ca 17 Jun '11

17 Jun '11

Hi, Using the new data manager and client it is possible to create a data entry system, enter data, then add labels that don't match the labels. E.g. enter "dogs", "cats", "cows" then add labels for that field that only has "dogs" and "cats". This results in the labels defined in the XML file not matching the previously entered data. One solution is "don't do that!" but at the moment it is too easy to do so, possibly accidentally. I've certainly seen situations where a data entry system has evolved. Removing a set of labels removes the link between the labels and the field (and warns before doing so) and I wonder if epidata should also warn when creating this inconsistency. It could potentially mean a lot of checking if the data file is large and the set of labels is used for many fields. I wouldn't want the system to over-write any data, but perhaps have a way of reporting which fields and values are inconsistent with the label set. David --

1 1

Epidata XML to R: how to handle labels?
by epidata-list＠lists.umanitoba.ca 14 Jun '11

14 Jun '11

I would like to hear opinions about using epidata labels when importing into R. Here's a fragment of XML showing a value label set from sample.epx that comes with the data manager. <ValueLabelSet id="valuelabelset_id_1"> <Type>3</Type> <Name>Float Set</Name> <Internal> <ValueLabel order="1" value="1,11"> <Label lang="en">Float value A</Label> </ValueLabel> <ValueLabel order="2" value="2,22"> <Label lang="en">Float value B</Label> </ValueLabel> <ValueLabel order="3" value="3,33"> <Label lang="en">Float value C</Label> </ValueLabel> <ValueLabel order="4" value="8,88" missing="true"> <Label lang="en">Second last missing</Label> </ValueLabel> <ValueLabel order="5" value="9,99" missing="true"> <Label lang="en">Float missing</Label> </ValueLabel> </Internal> </ValueLabelSet> This says that a numeric (float) variable that uses this label set should give the numeric value 1.11 the label "Float value A", etc. and defines two types of missing value: 8.88 and 9.99. R does not have a concept of labelling values in this way (unless things have changed recently; I have not kept up with the R list for a couple of years or so). R also does not have the concept of multiple missing values, you either have a value to work with, or you don't. It is possible to attach a comment to an R object, but these are not transferred when an object is copied, so they can easily become lost. I think we have the following options: 1) use the actual stored value (e.g. 1.11), ignore the labels except for the missing value definitions and for these just make them NA. This loses information that might be useful, but means that numeric values (and dates, etc) keep their data type and can be analysed appropriately, i.e. it is possible to calculate the mean of 1.11, but not the mean of "Float value A"; 2) use the labels in all cases. The result of this is the opposite of option 1: we keep the extra information that the labels provide, but coerce all data to characters/factors and can no longer do many statistical analyses; 3) create a new column for each variable that uses a label set, so that we have both the original data and next to it a column with the labels. This could result in a much larger data set in R, and possible confusion, especially if analyses change the value of one column and not the other. 4) Code in options 1-3, allowing the user to specify which approach to take; 5) Code in options 1-3, allowing the user to specify for each column which approach to take. 6) Do nothing at all to the data, just return the label information as part of the result of the read function (in the same way that the table structure is returned and the study meta data will be returned) so that the user can then easily read it in R and use it to recode variables/values appropriately. This would mean manually recoding missing values. I could write functions that are able to work with this information so that users can then fairly easily query the label set representation to find out the label information. In R this could work like this, assuming that x is an object that contains label information: epidata.label.value(1.11, x, "Float Set") > "Float value A" epidata.label.value(8.88, x, "Float Set") > "Second last missing" ## If the value is not in the value set, return NA epidata.label.value(1.66, x, "Float Set") > NA is.epidata.label.na(1.11, x, "Float Set") > FALSE is.epidata.label.na(8.88, x, "Float Set") > TRUE is.epidata.label.na(1.66, x, "Float Set") > NA As I work through this I am tending towards option 6; I would probably have to write functions like this anyway, and I think that making them visible to the user provides the greatest flexibility and probably limits the risk of being surprised. Recoding a variable in R using the labels would then work like this, assuming that x is an object created by importing an epidata XML file into R, and that the dataframe contains a field called "height": ## Extract the data frame and labels into new objects to make the code easier to read y <- x[[1]] z <- x[["value.labels"]] ## Apply the labels y$height <- epidata.label.value(y$height, z, "Float Set") ## Set the missing values is.epidata.label.na(y$height, z, "Float Set") <- NA David -- David Whiting, PhD | Senior Epidemiology & Public Health Specialist tel +32-2-6437945 | mob +32-496-266436 | David.Whiting(a)idf.org International Diabetes Federation 166 Chaussée de La Hulpe, B-1170 Brussels, Belgium tel +32-2-5385511 | fax +32-2-5385114 info(a)idf.org | www.idf.org | VAT BE 0433.674.528 IDF | Promoting diabetes care, prevention and a cure worldwide

1 6

question about double entry
by epidata-list＠lists.umanitoba.ca 14 Jun '11

14 Jun '11

Hello My question is if there are any possibilities to check up for double entry in EPIDATA ? Susann R Susann Regber Pediatric nurse, MPH, PHD-student Nordic School of Public Health Tel.: +46 (0)31 69 39 01 Mobile: +46(0) 738 19 04 19 E-mail: susann.regber(a)nhv.se Homepage: www.nhv.se P.O. Box 12133, SE-402 42 Gothenburg

1 1

Value labels and variable labels in R
by epidata-list＠lists.umanitoba.ca 14 Jun '11

14 Jun '11

Hello EpiData (Dungeons & Dragons :-D) Masters, I did read this post but after a second reply the messages get a little confused. So please forgive me if Im a little repetitive. I see at least three types of labels: 1. Data label 2. Variable label 3. Value label Im not aware of any data label function in R. In R, the variable label is the variable name, which is the charater vector on the top of the columns. So, if you have a dataset on object *data* and type > labels(data) # It will returns the columns and rows names > colnames(data) # will return the same columns names as labels. Value labels does not make sense in R, because labels are a discription of numeric values which represent categories. Thus, these numbers are not really numbers, rather categories. Most importing functions transforms these into factors, which will make sense in most analysis functions. What are value labels for? If you run a table analysis the labels comes in place of 1 or 2 or 99 or whatever values the labels should represent. R does it through the factors and their levels. What are the variable labels for? If you run a table analysis the labels comes in marginals in place of the variables names (in this case as dimnames()) that could be something meaningless such as v22. However the table and other functions in R do not recognize labels wherever they are. Thus importing variable labels would be of little use. Im awere of two packages that have importing functions that bring together with the data the variable labels and transform the value labels into factors, and uses these variables labels at the analysis functions automatically. epicalc and rms. epicalc does through the label.var()functionand Im not really sure where the label is stored. rms (or perhaps Hmisc... not sure) package uses the label() function - not the labels() function - and store the labels as attributes of each variable. If you type for example > attr(data$v22) # the label should appear as one of the variable attributes such as 'Age' or 'Sex'. Both packages uses these functions throughout all analysis functions and have functions to print a map of labels such as describe() and desc(). Describe is more rich and also returns some summaries of the variables. rms also has a datadist() function which is able to store at the R option() the distributions of variables values, which may be useful to set potential values in analysis tha your data is not representing (such as a bar plot that should range from 10 to 80 but your lowest and highest values are 12 and 68). But see, these variable labels will only work on analysis functions of epicalc if I use the importing function from epicalc - use() - and for rms if I use the rms importing functions - stata.get(), spss.get(), sas.get(). If I import with stata.get for example and run analysis with any other package than rms, the labels are ignored. :-( At the moment, I cant see how these can be improved unless a package EpiData comes along with analysis functions that automatically call the variable labels on tables and graphs. If variables labels are to be imported then storing them as variable attributes seems to be reasonable, instead of storing them at a second column in the same dataframe. But the way I see, the next step (analysis functions to be used) is the one that should guide how labels should be handled. Kind regards, Abraço forte e que a força esteja com você, Dr. Pedro Emmanuel A. A. do Brasil Instituto de Pesquisa Clínica Evandro Chagas Fundação Oswaldo Cruz Rio de Janeiro - Brasil Av. Brasil 4365 Tel 55 21 3865-9648 email: pedro.brasil(a)ipec.fiocruz.br email: emmanuel.brasil(a)gmail.com ---Apoio aos softwares livres www.zotero.org - gerenciamento de referências bibliográficas. www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações. www.epidata.dk - entrada de dados. www.r-project.org - análise de dados. www.ubuntu.com - sistema operacional

1 1

Reading the new XML files into R
by epidata-list＠lists.umanitoba.ca 12 Jun '11

12 Jun '11

To save me maybe re-inventing the wheel, has anyone looked at reading the new native XML files directly into R, ideally with SQL queries. There is an XML package for R http://www.omegahat.org/RSXML/c but it would be really useful if someone more capable than me, had already put together a method. Many thanks, Graham

1 20

release candidate of Manager
by epidata-list＠lists.umanitoba.ca 10 Jun '11

10 Jun '11

Dear All A "feature complete" version of EpiData Manager has been uploaded today - the expectation (read as "hope") is, that this version with bug removal can be released as version 1.0 * *NEW: *Create reports in HTML format using "Tools" * Default settings for a project * Change font and colour in program settings The reports on project structure (question lists, value label lists, combined lists) are produced when you click on tools menu. All epx/epz files in that folder are included. (Notice they must be of most recent format). For Entry Client a similar version will be released within the next days . The following aspecs will be fixed/clarified before release: a. A Proper About box will be added. b. For reports a grid of included files will be shown to the user - just like "add structure" to allow for in- and exclusion of single files. c. known and reported bugs. d. any bugs reported from now on by users. e. updating of documentation on the wiki. Another interesting development has taken place in the last week. We have localised an open-source based cross-platform testing system for interactive simulation of user behavior. We will spend some energy and time to develop validation tests to secure that the visual interface behaves properly after each new compilation. More news on this later. Therefore please: Now is the time to challenge the software and report problems and viewpoints on the products - before releasing as v1. Documentation and instruction files could be more elaborate obviously - this will come later. In practice it has turned out, that unless features work almost without instruction, users will not use it. Kind regards Torsten Christiansen Jens Lauritsen EpiData Association Denmark

1 5

new version of command line tool
by epidata-list＠lists.umanitoba.ca 06 Jun '11

06 Jun '11

Yesterday a new version of the command line template parser tool was compiled based on the most recent html structure for the epx files. Users can read more about this tool on: http://www.epidata.org/dokuwiki/doku.php/documentation:templateformat (the documentation will be finalised, but in general is correct) And get the updated version from: http://www.epidata.dk/testing.php regards Jens Lauritsen EpiData Association

1 1

request of help
by epidata-list＠lists.umanitoba.ca 03 Jun '11

03 Jun '11

Dear members in the group, I will be teaching an course of bioestatistics to a group of psychiatry residents. The idea is to teach statistic methodology but using scientific papers related to psychiatry. I would appreciate if any of you have some papers with the databases that could provide me. Best wishes Jose Farfan Mexico

1 0