Re: [EpiData-list] Importing data to Epidata (v.2.0.5.51)
Thank you very much Jamie, your help was indeed very helpful. I believe it also makes sense to convert yes/no/NA cells to have a numerical value e.g. 1,2,3 ?
Is there a way to change the field type afterwards other than touching the XML code and if so, is it wise/dangerous?
Pekka
Jamie wrote:
I have found that import from the clipboard works well if the Excel data are prepared for the import. Here is what I do.
In Excel, format the columns they way you want them to appear in EpiData (text columns, numbers with set decimal places, dates as YYYY/MM/DD is best). I usually then format the columns to fit the data No spaces in your column names - these should be valid EpiData field names for consistency Make sure that the first and last rows of data are complete; I often put dummy data in these and then delete the records in EpiData. I believe the main thing here is to have valid data (not empty cells) in the last column of data for the first and/or last rows. Ideally, have no blank cells - put missing values in. This is probably not necessary, as EpiData handles empty cells well.
You may find that fields are not the length that you want. You can safely edit these fields to change the size. Text, number and data fields should all come through with the correct type.
I don?t know of any simple way to convert data for an Excel column to coded values with ?value labels?. Unless your data set is massive, there is no real penalty to keeping the values in full.
When I had to automate repeated conversion of messy Excel data to EpiData, I made a new sheet within the Excel file and used formulas to put copies of the original data into the new sheet, with all of the cleaning. For example, we had some inconsistencies in coding province, so I used a lookup table to convert the original values (Ontario, Ont, ON) to be consistent (ON). This is one way to do your data coding, by the way, but you still cannot automate getting the value labels into EpiData. In my case, the data set was small and existing data might change, so I just kept importing the whole thing. You can also just convert the new data, import to a new EpiData project and then append this to your complete project.
You will have problems with import from the clipboard if you have commas in your text fields. In my testing, the import fails without an error message.
EpiData is very fussy about importing data, as it should be, since it does such a good job of maintaining the integrity of data when you use it for data entry. Import of csv data does work, but you have to be very sure to have clean data and to not have stray delimiters in your data. In any case, your data should be in good shape since it originates in a database (what database?? Maybe there is another more direct way to get into EpiData) --? Jamie
Pekka wrote: What I want to do is to import a spreadsheet with field labels and data to Epidata. The spreasheet contains field names on the top row and records on the following rows. Once I have imported the initial data, I would like to ammend the records with additional fields, i.e. have the initial spreadsheet data as the seed data in records. I think this is a very normal practice to do, but I have run into several difficulties:
- I have tried to import the spreadsheet with "File/Import file" but I
don't seem to get it working. I have turned the spreasheet from Excel to CSV, but the dialogue box that opens doesn't recognize data. I made a workaround by copying a part of the spreadsheet and used "File/Import from clipboard" and that did work. The downside is that you have to do it part by part, and the fundamental problem is why it doesn't work in the first place. Is Epidata Manager very picky as to what field separators are used in CSV? I have tried changing field separators but with no luck. Clipboard seems to be the only workaround, but even still I have to do it in parts. (I am using Ubuntu and the latest Linux 64-bit version)
- With the copy-paste method I noticed that Epidata was smartly able to
interpret some fields with the right type (e.g. date, but not all date-fields, but I believe this is the issue of my spreadsheet), but not all. I have not discovered a way to change the field type afterwards, is it at all possible? The only hack I figured out was to manually change the field type in XML code directly, but I would be very hesitant to do it. If I touch XML code directly, will that mess up everything, or is it the simple fix to the problem?
- Value-labels: My spreadsheet contains no value labels, obviously. It has
for example field "Country of origin" and the value is string "Nepal". Is there a way for me to easily convert strings of text into value labels as a part of the import process? String values are consistent (the spreadsheet is an output file from a database), so I should imagine it would be possible do so.
- Provided I get questions 1,2 and 3 adequately resolved, I should be fine
with the initial data import. But what do I do if I want to later import additional records from the same spreadsheet structure to my Epidata project? Is this something that Append is supposed to do? There I have the same issue with the spreadsheet compatibility: The CSV file does not open in a recognisable way, hence I haven't been able to play around and test it either.
As far as I know you cannot change field type within Manager
Jamie
On Jun 24, 2015 4:29 AM, EpiData development and support epidata-list@lists.umanitoba.ca wrote:
Thank you very much Jamie, your help was indeed very helpful. I believe it also makes sense to convert yes/no/NA cells to have a numerical value e.g. 1,2,3 ?
Is there a way to change the field type afterwards other than touching the XML code and if so, is it wise/dangerous?
Pekka
Jamie wrote:
I have found that import from the clipboard works well if the Excel data are prepared for the import. Here is what I do.
In Excel, format the columns they way you want them to appear in EpiData (text columns, numbers with set decimal places, dates as YYYY/MM/DD is best). I usually then format the columns to fit the data No spaces in your column names - these should be valid EpiData field names for consistency Make sure that the first and last rows of data are complete; I often put dummy data in these and then delete the records in EpiData. I believe the main thing here is to have valid data (not empty cells) in the last column of data for the first and/or last rows. Ideally, have no blank cells - put missing values in. This is probably not necessary, as EpiData handles empty cells well.
You may find that fields are not the length that you want. You can safely edit these fields to change the size. Text, number and data fields should all come through with the correct type.
I don?t know of any simple way to convert data for an Excel column to coded values with ?value labels?. Unless your data set is massive, there is no real penalty to keeping the values in full.
When I had to automate repeated conversion of messy Excel data to EpiData, I made a new sheet within the Excel file and used formulas to put copies of the original data into the new sheet, with all of the cleaning. For example, we had some inconsistencies in coding province, so I used a lookup table to convert the original values (Ontario, Ont, ON) to be consistent (ON). This is one way to do your data coding, by the way, but you still cannot automate getting the value labels into EpiData. In my case, the data set was small and existing data might change, so I just kept importing the whole thing. You can also just convert the new data, import to a new EpiData project and then append this to your complete project.
You will have problems with import from the clipboard if you have commas in your text fields. In my testing, the import fails without an error message.
EpiData is very fussy about importing data, as it should be, since it does such a good job of maintaining the integrity of data when you use it for data entry. Import of csv data does work, but you have to be very sure to have clean data and to not have stray delimiters in your data. In any case, your data should be in good shape since it originates in a database (what database?? Maybe there is another more direct way to get into EpiData) --? Jamie
Pekka wrote: What I want to do is to import a spreadsheet with field labels and data to Epidata. The spreasheet contains field names on the top row and records on the following rows. Once I have imported the initial data, I would like to ammend the records with additional fields, i.e. have the initial spreadsheet data as the seed data in records. I think this is a very normal practice to do, but I have run into several difficulties:
- I have tried to import the spreadsheet with "File/Import file" but I
don't seem to get it working. I have turned the spreasheet from Excel to CSV, but the dialogue box that opens doesn't recognize data. I made a workaround by copying a part of the spreadsheet and used "File/Import from clipboard" and that did work. The downside is that you have to do it part by part, and the fundamental problem is why it doesn't work in the first place. Is Epidata Manager very picky as to what field separators are used in CSV? I have tried changing field separators but with no luck. Clipboard seems to be the only workaround, but even still I have to do it in parts. (I am using Ubuntu and the latest Linux 64-bit version)
- With the copy-paste method I noticed that Epidata was smartly able to
interpret some fields with the right type (e.g. date, but not all date-fields, but I believe this is the issue of my spreadsheet), but not all. I have not discovered a way to change the field type afterwards, is it at all possible? The only hack I figured out was to manually change the field type in XML code directly, but I would be very hesitant to do it. If I touch XML code directly, will that mess up everything, or is it the simple fix to the problem?
- Value-labels: My spreadsheet contains no value labels, obviously. It has
for example field "Country of origin" and the value is string "Nepal". Is there a way for me to easily convert strings of text into value labels as a part of the import process? String values are consistent (the spreadsheet is an output file from a database), so I should imagine it would be possible do so.
- Provided I get questions 1,2 and 3 adequately resolved, I should be fine
with the initial data import. But what do I do if I want to later import additional records from the same spreadsheet structure to my Epidata project? Is this something that Append is supposed to do? There I have the same issue with the spreadsheet compatibility: The CSV file does not open in a recognisable way, hence I haven't been able to play around and test it either.
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
I thought so too, thanks Jamie.
Can anyone from the development team comment on the issue of changing the field type and whether it is safe/wise to do so directly in XML? I tested it and it worked out fine, at least it appeared so, but there may be issues which are not immediately visible for a user.
Thanks!
Pekka
2015-06-24 14:57 GMT+01:00 EpiData development and support < epidata-list@lists.umanitoba.ca>:
As far as I know you cannot change field type within Manager
Jamie
On Jun 24, 2015 4:29 AM, EpiData development and support < epidata-list@lists.umanitoba.ca> wrote:
Thank you very much Jamie, your help was indeed very helpful. I believe
it
also makes sense to convert yes/no/NA cells to have a numerical value
e.g.
1,2,3 ?
Is there a way to change the field type afterwards other than touching
the
XML code and if so, is it wise/dangerous?
Pekka
Jamie wrote:
I have found that import from the clipboard works well if the Excel
data
are prepared for the import. Here is what I do.
In Excel, format the columns they way you want them to appear in
EpiData
(text columns, numbers with set decimal places, dates as YYYY/MM/DD is best). I usually then format the columns to fit the data No spaces in your column names - these should be valid EpiData field
names
for consistency Make sure that the first and last rows of data are complete; I often
put
dummy data in these and then delete the records in EpiData. I believe
the
main thing here is to have valid data (not empty cells) in the last
column
of data for the first and/or last rows. Ideally, have no blank cells - put missing values in. This is probably
not
necessary, as EpiData handles empty cells well.
You may find that fields are not the length that you want. You can
safely
edit these fields to change the size. Text, number and data fields
should
all come through with the correct type.
I don?t know of any simple way to convert data for an Excel column to coded values with ?value labels?. Unless your data set is massive,
there is
no real penalty to keeping the values in full.
When I had to automate repeated conversion of messy Excel data to
EpiData,
I made a new sheet within the Excel file and used formulas to put
copies of
the original data into the new sheet, with all of the cleaning. For example, we had some inconsistencies in coding province, so I used a
lookup
table to convert the original values (Ontario, Ont, ON) to be
consistent
(ON). This is one way to do your data coding, by the way, but you still cannot automate getting the value labels into EpiData. In my case, the
data
set was small and existing data might change, so I just kept importing
the
whole thing. You can also just convert the new data, import to a new EpiData project and then append this to your complete project.
You will have problems with import from the clipboard if you have
commas
in your text fields. In my testing, the import fails without an error message.
EpiData is very fussy about importing data, as it should be, since it
does
such a good job of maintaining the integrity of data when you use it
for
data entry. Import of csv data does work, but you have to be very sure
to
have clean data and to not have stray delimiters in your data. In any
case,
your data should be in good shape since it originates in a database
(what
database?? Maybe there is another more direct way to get into EpiData) --? Jamie
Pekka wrote: What I want to do is to import a spreadsheet with field labels and
data to
Epidata. The spreasheet contains field names on the top row and
records on
the following rows. Once I have imported the initial data, I would
like to
ammend the records with additional fields, i.e. have the initial spreadsheet data as the seed data in records. I think this is a very
normal
practice to do, but I have run into several difficulties:
- I have tried to import the spreadsheet with "File/Import file" but I
don't seem to get it working. I have turned the spreasheet from Excel
to
CSV, but the dialogue box that opens doesn't recognize data. I made a workaround by copying a part of the spreadsheet and used "File/Import
from
clipboard" and that did work. The downside is that you have to do it
part
by part, and the fundamental problem is why it doesn't work in the
first
place. Is Epidata Manager very picky as to what field separators are
used
in CSV? I have tried changing field separators but with no luck.
Clipboard
seems to be the only workaround, but even still I have to do it in
parts.
(I am using Ubuntu and the latest Linux 64-bit version)
- With the copy-paste method I noticed that Epidata was smartly able
to
interpret some fields with the right type (e.g. date, but not all date-fields, but I believe this is the issue of my spreadsheet), but
not
all. I have not discovered a way to change the field type afterwards,
is it
at all possible? The only hack I figured out was to manually change the field type in XML code directly, but I would be very hesitant to do
it. If
I touch XML code directly, will that mess up everything, or is it the simple fix to the problem?
- Value-labels: My spreadsheet contains no value labels, obviously.
It has
for example field "Country of origin" and the value is string "Nepal".
Is
there a way for me to easily convert strings of text into value labels
as a
part of the import process? String values are consistent (the
spreadsheet
is an output file from a database), so I should imagine it would be possible do so.
- Provided I get questions 1,2 and 3 adequately resolved, I should be
fine
with the initial data import. But what do I do if I want to later
import
additional records from the same spreadsheet structure to my Epidata project? Is this something that Append is supposed to do? There I have
the
same issue with the spreadsheet compatibility: The CSV file does not
open
in a recognisable way, hence I haven't been able to play around and
test it
either.
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
The short answer is: There is no safe way of changing from one variable (field) type to another . The reason for this is that in many cases we would have to make decisions which could alter data in a different way than the user intended.
The best good practice way of importing data from a spreadsheet was summarised by Jamie Hockin, but the aspect of deciding on field type was not mentioned. This is step 7 below.
The best practice is repeated below (hopefully in a structured and understandable way):
The step list is then: 1. open and work in your spreadsheet file 2. check each column, one by one and make sure that all cells in that column have the same format. E.g. all cells in column M should be date if M contains dates 3. add one right most column, call it "dummy" and insert a 1 in all cells in that column in all rows which contains data. 4. Make sure all names of columns (in row 1) are valid. E.g. "age_of_this_patient" must be changed to "age" or "Thisverylongname" should be made shorter "thisname" 5. I change the decimal separator to . and I omit all thousand separators from formats. 6. I make sure that any "boolean"y/n" data are changed to 0 1 or 0 1 9, where 9 indicates blank. Boolean should be avoided since many software, e.g. those from M$ record no value as a "no" on export or conversion instead of as a "no value present". 7. I insert a row after the name row where I specifically change the content, such variables are converted correctly in Manager. E.g. in Denmark we have civil registration numbers which are 10 digits long, and we should read this as string to use it appropriately for a key. e.g. insert xxx in that place. 8. Then I mark the whole dataset as a block and copy to clipboard
After that import in Manager should be straight forward. As first part of the definition I would then in Manager:
a. look at data in browse. In particular I look at the first, the last and a few records in the middle to make sure that the import is working correctly. b. save the file in Manager c. open the file in EntryClient d. mark the added record (point 7 above) for deletion e. save the file f. pack the file in Manager, where I delete the record marked for deletion. g. Then I finalise the documentation of the data, e.g. add project descriptions and value labels or further controls such as range, calculations etc. h. I save the project i. I now run the "Data Content Validation" to find values not confirming to the content definition form point g. j. I run the "count by ID" documentation feature to make sure all id's are as expected. k. Any errors found in data or documentation are then fixed appropriately.
The same routine is applied whenever I am asked to look at data, which was entered or defined by other persons.
regards Jens Lauritsen EpiData Association
Den 25. jun. 2015, EpiData development and support epidata-list@lists.umanitoba.ca skrev:
I thought so too, thanks Jamie.
Can anyone from the development team comment on the issue of changing the field type and whether it is safe/wise to do so directly in XML? I tested it and it worked out fine, at least it appeared so, but there may be issues which are not immediately visible for a user.
Thanks!
Pekka
Dear Jens,
Thank you very much. I followed your instructions and in the end was able to do a successful import. I discovered that indeed empty cells was the source of trouble. Once I had filled every empty cell with 9999, the import worked out beautifully. In my experiment, empty cells prevented paste even in fields other than boolean.
By the way, I only now discovered this manual: http://www.tbrieder.org/epidata/course_a.pdf It is really good, but I hadn't found it earlier for some reason, and I don't think it is linked on the epidata website.
Now I have come across the next hurdle: editing value labels. It seems I cannot alter value labels any more in Epidata manager. I guess the reason is to ensure not to accidentally change them, but I think it is quite common that at some point of data entry you realize you need to add another label, or just correct a typo in an existing label. How do you do that? I discovered that even when I exported the project to make a new project empty of data, the value labels remained uneditable. The same was also true when importing value labels through project/value labels/external from a previous project; I can't edit them.
Thank you again!
Pekka
2015-06-25 10:59 GMT+01:00 EpiData development and support < epidata-list@lists.umanitoba.ca>:
The short answer is: There is no safe way of changing from one variable (field) type to another . The reason for this is that in many cases we would have to make decisions which could alter data in a different way than the user intended.
The best good practice way of importing data from a spreadsheet was summarised by Jamie Hockin, but the aspect of deciding on field type was not mentioned. This is step 7 below.
The best practice is repeated below (hopefully in a structured and understandable way):
The step list is then:
- open and work in your spreadsheet file
- check each column, one by one and make sure that all cells in that
column have the same format. E.g. all cells in column M should be date if M contains dates 3. add one right most column, call it "dummy" and insert a 1 in all cells in that column in all rows which contains data. 4. Make sure all names of columns (in row 1) are valid. E.g. "age_of_this_patient" must be changed to "age" or "Thisverylongname" should be made shorter "thisname" 5. I change the decimal separator to . and I omit all thousand separators from formats. 6. I make sure that any "boolean"y/n" data are changed to 0 1 or 0 1 9, where 9 indicates blank. Boolean should be avoided since many software, e.g. those from M$ record no value as a "no" on export or conversion instead of as a "no value present". 7. I insert a row after the name row where I specifically change the content, such variables are converted correctly in Manager. E.g. in Denmark we have civil registration numbers which are 10 digits long, and we should read this as string to use it appropriately for a key. e.g. insert xxx in that place. 8. Then I mark the whole dataset as a block and copy to clipboard
After that import in Manager should be straight forward. As first part of the definition I would then in Manager:
a. look at data in browse. In particular I look at the first, the last and a few records in the middle to make sure that the import is working correctly. b. save the file in Manager c. open the file in EntryClient d. mark the added record (point 7 above) for deletion e. save the file f. pack the file in Manager, where I delete the record marked for deletion. g. Then I finalise the documentation of the data, e.g. add project descriptions and value labels or further controls such as range, calculations etc. h. I save the project i. I now run the "Data Content Validation" to find values not confirming to the content definition form point g. j. I run the "count by ID" documentation feature to make sure all id's are as expected. k. Any errors found in data or documentation are then fixed appropriately.
The same routine is applied whenever I am asked to look at data, which was entered or defined by other persons.
regards Jens Lauritsen EpiData Association
Den 25. jun. 2015, EpiData development and support < epidata-list@lists.umanitoba.ca> skrev:
I thought so too, thanks Jamie.
Can anyone from the development team comment on the issue of changing the field type and whether it is safe/wise to do so directly in XML? I tested it and it worked out fine, at least it appeared so, but there may be
issues
which are not immediately visible for a user.
Thanks!
Pekka
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
participants (1)
-
EpiData development and support