Error in exporting .epx file to .dta
Dear team members, I had entered data in an .epx file created using EpiData Manager v4.2.0.0 with jumps set for certain variables using the last defined and second last defined missing values. A unique identifier was also created for each record by combining two integer fields. On importing the data to stata for analysis, the following issues have been noted: 1. The unique id field remains empty in the stata file 2. Wherever, missing values were indicated in the .epx file, they appear as large random numbers in the stata file and not as the actual value that was entered. For example, age entered as 999 (missing) in the .epx file shows up as "32740" in the stata file. This happens only when the "missing" box is ticked while assigning value labels for categorical or continuous variables.
This issue has been noted in the previous versions of epidata manager as well as while exporting to newer versions of stata and SPSS.
I have attached the sample .epx file and the .dta file for your reference.
Thanks and Regards
Dr. Divya Nair
Question : Stata uses one or more large integers as missing value and labels these as .a .b etc.
Should we export missing values as the original value, e.g. 999 and loosing stata missing definition or use .a etc?
Jens Lauritsen Epidata.dk
Den 23. oktober 2017 16.42.57 CEST, EpiData development and support epidata-list@lists.umanitoba.ca skrev:
Dear team members, I had entered data in an .epx file created using EpiData Manager v4.2.0.0 with jumps set for certain variables using the last defined and second last defined missing values. A unique identifier was also created for each record by combining two integer fields. On importing the data to stata for analysis, the following issues have been noted:
- The unique id field remains empty in the stata file
- Wherever, missing values were indicated in the .epx file, they
appear as large random numbers in the stata file and not as the actual value that was entered. For example, age entered as 999 (missing) in the .epx file shows up as "32740" in the stata file. This happens only when the "missing" box is ticked while assigning value labels for categorical or continuous variables.
This issue has been noted in the previous versions of epidata manager as well as while exporting to newer versions of stata and SPSS.
I have attached the sample .epx file and the .dta file for your reference.
Thanks and Regards
Dr. Divya Nair
From https://www.stata.com/help.cgi?dta:
+32740 is the largest non-missing integer (2-byte) value in Stata. The internal value for the first missing value (.) is +32741. Missing values seem to handled that same way in all versions of the Stata dataset. I would encourage people to use the most recent Stata version unless they are tied to a legacy version of Stata or requirement for an old version for import elsewhere.
I would think that for Stata use, the actual coded value of missing (e.g. 9 or 99 or 999) is not as relevant as it is in EpiData, which provides users the link to their paper forms and study codebooks. However, where there are multiple missing values, it would be advantageous to maintain the distinction.
I propose:
EpiData —> Stata system missing —> . first declared missing value —> .a next declared missing value —> .b etc. Since every Stata numeric data type allows for 27 missing values, there should be no issue with this. The actual internal values for ., .a, etc. will depend on the Stata numeric type.
When reading Stata data, I propose that EpiData treats all missing values as system missing (in Manager and Analysis). There is no clear logic to setting other values as declared missing, especially if the .dta has no value labels. I could imagine this being an option, in which .a, .b, … .z get assigned to their internal Stata values and the EpiData user will have to define the missing values accordingly.
I will submit a bug report on the problem with setting missing values on export.
Jamie
On Oct 23, 2017, at 14:58, EpiData development and support epidata-list@lists.umanitoba.ca wrote:
Question : Stata uses one or more large integers as missing value and labels these as .a .b etc.
Should we export missing values as the original value, e.g. 999 and loosing stata missing definition or use .a etc?
Jens Lauritsen Epidata.dk
Den 23. oktober 2017 16.42.57 CEST, EpiData development and support epidata-list@lists.umanitoba.ca skrev:
Dear team members, I had entered data in an .epx file created using EpiData Manager v4.2.0.0 with jumps set for certain variables using the last defined and second last defined missing values. A unique identifier was also created for each record by combining two integer fields. On importing the data to stata for analysis, the following issues have been noted:
- The unique id field remains empty in the stata file
- Wherever, missing values were indicated in the .epx file, they
appear as large random numbers in the stata file and not as the actual value that was entered. For example, age entered as 999 (missing) in the .epx file shows up as "32740" in the stata file. This happens only when the "missing" box is ticked while assigning value labels for categorical or continuous variables.
This issue has been noted in the previous versions of epidata manager as well as while exporting to newer versions of stata and SPSS.
I have attached the sample .epx file and the .dta file for your reference.
Thanks and Regards
Dr. Divya Nair
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
What version of Stata did you export to? The default setting is exporting to Stata 9
Kind regards, Torsten Bonde Christiansen EpiData.dk
On 2017-10-23 16:42, EpiData development and support wrote:
Dear team members, I had entered data in an .epx file created using EpiData Manager v4.2.0.0 with jumps set for certain variables using the last defined and second last defined missing values. A unique identifier was also created for each record by combining two integer fields. On importing the data to stata for analysis, the following issues have been noted:
- The unique id field remains empty in the stata file
- Wherever, missing values were indicated in the .epx file, they appear as
large random numbers in the stata file and not as the actual value that was entered. For example, age entered as 999 (missing) in the .epx file shows up as "32740" in the stata file. This happens only when the "missing" box is ticked while assigning value labels for categorical or continuous variables.
This issue has been noted in the previous versions of epidata manager as well as while exporting to newer versions of stata and SPSS.
I have attached the sample .epx file and the .dta file for your reference.
Thanks and Regards
Dr. Divya Nair
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
participants (1)
-
EpiData development and support