Hi, I have imported a .qes file into EpiData Manager and re-saved it in
the new format. The document is a questionnaire and I am trying to make
updates to the fields. I have two questions...
1) Is it possible to insert a new text field in between two already
established fields?
2) I've noticed that the label names no longer show up on the form; will
this effect coding capabilities?
Thanks for the assistance.
--------------------------------------------------------------------------------------------------------
Jonathan E.
Outbreak Management Division
Centre for Food-borne, Environmental and Zoonotic Infectious Diseases
Infectious Disease Prevention and Control Branch
Public Health Agency of Canada
Hi David
It is correct that neither Manager nor EntryClient checks the datafile for consistency upon opening.
We are thinking that this type of check will be available as a tool within Manager and/or Analysis. Since these are the programs used by the project maintainers.
Please feel free to discuss if this is sufficient or another solution is better.
Kind regards,
Torsten Bonde Christiansen.
Epidata Association.
----- Reply message -----
Fra: epidata-list(a)lists.umanitoba.ca
Dato: fre., jun. 17, 2011 19:45
Emne: [EpiData-list] It is possible to have labels that are inconsistent with previously entered data
Til: <epidata-list(a)lists.umanitoba.ca>
Hi,
Using the new data manager and client it is possible to create a data entry system, enter data, then add labels that don't match the labels. E.g. enter "dogs", "cats", "cows" then add labels for that field that only has "dogs" and "cats". This results in the labels defined in the XML file not matching the previously entered data. One solution is "don't do that!" but at the moment it is too easy to do so, possibly accidentally. I've certainly seen situations where a data entry system has evolved. Removing a set of labels removes the link between the labels and the field (and warns before doing so) and I wonder if epidata should also warn when creating this inconsistency. It could potentially mean a lot of checking if the data file is large and the set of labels is used for many fields. I wouldn't want the system to over-write any data, but perhaps have a way of reporting which fields and values are inconsistent with the label set.
David
--
_______________________________________________
EpiData-list mailing list
EpiData-list(a)lists.umanitoba.ca
http://lists.umanitoba.ca/mailman/listinfo/epidata-list
Hi,
Using the new data manager and client it is possible to create a data entry system, enter data, then add labels that don't match the labels. E.g. enter "dogs", "cats", "cows" then add labels for that field that only has "dogs" and "cats". This results in the labels defined in the XML file not matching the previously entered data. One solution is "don't do that!" but at the moment it is too easy to do so, possibly accidentally. I've certainly seen situations where a data entry system has evolved. Removing a set of labels removes the link between the labels and the field (and warns before doing so) and I wonder if epidata should also warn when creating this inconsistency. It could potentially mean a lot of checking if the data file is large and the set of labels is used for many fields. I wouldn't want the system to over-write any data, but perhaps have a way of reporting which fields and values are inconsistent with the label set.
David
--
I would like to hear opinions about using epidata labels when importing into R. Here's a fragment of XML showing a value label set from sample.epx that comes with the data manager.
<ValueLabelSet id="valuelabelset_id_1">
<Type>3</Type>
<Name>Float Set</Name>
<Internal>
<ValueLabel order="1" value="1,11">
<Label lang="en">Float value A</Label>
</ValueLabel>
<ValueLabel order="2" value="2,22">
<Label lang="en">Float value B</Label>
</ValueLabel>
<ValueLabel order="3" value="3,33">
<Label lang="en">Float value C</Label>
</ValueLabel>
<ValueLabel order="4" value="8,88" missing="true">
<Label lang="en">Second last missing</Label>
</ValueLabel>
<ValueLabel order="5" value="9,99" missing="true">
<Label lang="en">Float missing</Label>
</ValueLabel>
</Internal>
</ValueLabelSet>
This says that a numeric (float) variable that uses this label set should give the numeric value 1.11 the label "Float value A", etc. and defines two types of missing value: 8.88 and 9.99. R does not have a concept of labelling values in this way (unless things have changed recently; I have not kept up with the R list for a couple of years or so). R also does not have the concept of multiple missing values, you either have a value to work with, or you don't. It is possible to attach a comment to an R object, but these are not transferred when an object is copied, so they can easily become lost.
I think we have the following options:
1) use the actual stored value (e.g. 1.11), ignore the labels except for the missing value definitions and for these just make them NA. This loses information that might be useful, but means that numeric values (and dates, etc) keep their data type and can be analysed appropriately, i.e. it is possible to calculate the mean of 1.11, but not the mean of "Float value A";
2) use the labels in all cases. The result of this is the opposite of option 1: we keep the extra information that the labels provide, but coerce all data to characters/factors and can no longer do many statistical analyses;
3) create a new column for each variable that uses a label set, so that we have both the original data and next to it a column with the labels. This could result in a much larger data set in R, and possible confusion, especially if analyses change the value of one column and not the other.
4) Code in options 1-3, allowing the user to specify which approach to take;
5) Code in options 1-3, allowing the user to specify for each column which approach to take.
6) Do nothing at all to the data, just return the label information as part of the result of the read function (in the same way that the table structure is returned and the study meta data will be returned) so that the user can then easily read it in R and use it to recode variables/values appropriately. This would mean manually recoding missing values. I could write functions that are able to work with this information so that users can then fairly easily query the label set representation to find out the label information. In R this could work like this, assuming that x is an object that contains label information:
epidata.label.value(1.11, x, "Float Set")
> "Float value A"
epidata.label.value(8.88, x, "Float Set")
> "Second last missing"
## If the value is not in the value set, return NA
epidata.label.value(1.66, x, "Float Set")
> NA
is.epidata.label.na(1.11, x, "Float Set")
> FALSE
is.epidata.label.na(8.88, x, "Float Set")
> TRUE
is.epidata.label.na(1.66, x, "Float Set")
> NA
As I work through this I am tending towards option 6; I would probably have to write functions like this anyway, and I think that making them visible to the user provides the greatest flexibility and probably limits the risk of being surprised. Recoding a variable in R using the labels would then work like this, assuming that x is an object created by importing an epidata XML file into R, and that the dataframe contains a field called "height":
## Extract the data frame and labels into new objects to make the code easier to read
y <- x[[1]]
z <- x[["value.labels"]]
## Apply the labels
y$height <- epidata.label.value(y$height, z, "Float Set")
## Set the missing values
is.epidata.label.na(y$height, z, "Float Set") <- NA
David
--
David Whiting, PhD | Senior Epidemiology & Public Health Specialist
tel +32-2-6437945 | mob +32-496-266436 | David.Whiting(a)idf.org
International Diabetes Federation
166 Chaussée de La Hulpe, B-1170 Brussels, Belgium
tel +32-2-5385511 | fax +32-2-5385114
info(a)idf.org | www.idf.org | VAT BE 0433.674.528
IDF | Promoting diabetes care, prevention and a cure worldwide
Hello
My question is if there are any possibilities to check up for double entry in EPIDATA ?
Susann R
Susann Regber
Pediatric nurse, MPH,
PHD-student
Nordic School of Public Health
Tel.: +46 (0)31 69 39 01
Mobile: +46(0) 738 19 04 19
E-mail: susann.regber(a)nhv.se
Homepage: www.nhv.se
P.O. Box 12133, SE-402 42 Gothenburg
Hello EpiData (Dungeons & Dragons :-D) Masters,
I did read this post but after a second reply the messages get a little
confused. So please forgive me if Im a little repetitive.
I see at least three types of labels:
1. Data label
2. Variable label
3. Value label
Im not aware of any data label function in R.
In R, the variable label is the variable name, which is the charater vector
on the top of the columns. So, if you have a dataset on object *data* and
type
> labels(data) # It will returns the columns and rows names
> colnames(data) # will return the same columns names as labels.
Value labels does not make sense in R, because labels are a discription of
numeric values which represent categories. Thus, these numbers are not
really numbers, rather categories. Most importing functions transforms these
into factors, which will make sense in most analysis functions.
What are value labels for? If you run a table analysis the labels comes in
place of 1 or 2 or 99 or whatever values the labels should represent. R does
it through the factors and their levels.
What are the variable labels for? If you run a table analysis the labels
comes in marginals in place of the variables names (in this case as
dimnames()) that could be something meaningless such as v22. However the
table and other functions in R do not recognize labels wherever they are.
Thus importing variable labels would be of little use.
Im awere of two packages that have importing functions that bring together
with the data the variable labels and transform the value labels into
factors, and uses these variables labels at the analysis functions
automatically. epicalc and rms. epicalc does through the
label.var()functionand Im not really sure where the label is stored.
rms (or perhaps Hmisc...
not sure) package uses the label() function - not the labels() function -
and store the labels as attributes of each variable. If you type for example
> attr(data$v22) # the label should appear as one of the variable attributes
such as 'Age' or 'Sex'.
Both packages uses these functions throughout all analysis functions and
have functions to print a map of labels such as describe() and desc().
Describe is more rich and also returns some summaries of the variables. rms
also has a datadist() function which is able to store at the R option() the
distributions of variables values, which may be useful to set potential
values in analysis tha your data is not representing (such as a bar plot
that should range from 10 to 80 but your lowest and highest values are 12
and 68).
But see, these variable labels will only work on analysis functions of
epicalc if I use the importing function from epicalc - use() - and for rms
if I use the rms importing functions - stata.get(), spss.get(), sas.get().
If I import with stata.get for example and run analysis with any other
package than rms, the labels are ignored. :-(
At the moment, I cant see how these can be improved unless a package EpiData
comes along with analysis functions that automatically call the variable
labels on tables and graphs.
If variables labels are to be imported then storing them as variable
attributes seems to be reasonable, instead of storing them at a second
column in the same dataframe. But the way I see, the next step (analysis
functions to be used) is the one that should guide how labels should be
handled.
Kind regards,
Abraço forte e que a força esteja com você,
Dr. Pedro Emmanuel A. A. do Brasil
Instituto de Pesquisa Clínica Evandro Chagas
Fundação Oswaldo Cruz
Rio de Janeiro - Brasil
Av. Brasil 4365
Tel 55 21 3865-9648
email: pedro.brasil(a)ipec.fiocruz.br
email: emmanuel.brasil(a)gmail.com
---Apoio aos softwares livres
www.zotero.org - gerenciamento de referências bibliográficas.
www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações.
www.epidata.dk - entrada de dados.
www.r-project.org - análise de dados.
www.ubuntu.com - sistema operacional
To save me maybe re-inventing the wheel, has anyone looked at reading the
new native XML files directly into R, ideally with SQL queries.
There is an XML package for R http://www.omegahat.org/RSXML/c but it would
be really useful if someone more capable than me, had already put together a
method.
Many thanks,
Graham
Dear All
A "feature complete" version of EpiData Manager has been uploaded today
- the expectation (read as "hope") is, that this version with bug
removal can be released as version 1.0
* *NEW: *Create reports in HTML format using "Tools"
* Default settings for a project
* Change font and colour in program settings
The reports on project structure (question lists, value label lists,
combined lists) are produced when you click on tools menu. All epx/epz
files in that folder are included. (Notice they must be of most recent
format).
For Entry Client a similar version will be released within the next days .
The following aspecs will be fixed/clarified before release:
a. A Proper About box will be added.
b. For reports a grid of included files will be shown to the user - just
like "add structure" to allow for in- and exclusion of single files.
c. known and reported bugs.
d. any bugs reported from now on by users.
e. updating of documentation on the wiki.
Another interesting development has taken place in the last week. We
have localised an open-source based cross-platform testing system for
interactive simulation of user behavior. We will spend some energy and
time to develop validation tests to secure that the visual interface
behaves properly after each new compilation. More news on this later.
Therefore please:
Now is the time to challenge the software and report problems and
viewpoints on the products - before releasing as v1. Documentation and
instruction files could be more elaborate obviously - this will come
later. In practice it has turned out, that unless features work almost
without instruction, users will not use it.
Kind regards
Torsten Christiansen
Jens Lauritsen
EpiData Association
Denmark
Yesterday a new version of the command line template parser tool was
compiled based on the most recent html structure for the epx files.
Users can read more about this tool on:
http://www.epidata.org/dokuwiki/doku.php/documentation:templateformat
(the documentation will be finalised, but in general is correct)
And get the updated version from:
http://www.epidata.dk/testing.php
regards
Jens Lauritsen
EpiData Association
Dear members in the group, I will be teaching an course of
bioestatistics to a group of psychiatry residents. The idea is to
teach statistic methodology but using scientific papers related to
psychiatry. I would appreciate if any of you have some papers with
the databases that could provide me.
Best wishes
Jose Farfan
Mexico