[EpiData-list] Error in exporting .epx file to .dta

EpiData development and support epidata-list at lists.umanitoba.ca
Mon Oct 23 15:56:45 CDT 2017

From https://www.stata.com/help.cgi?dta:

+32740 is the largest non-missing integer (2-byte) value in Stata. The internal value for the first missing value (.) is +32741.
Missing values seem to handled that same way in all versions of the Stata dataset. I would encourage people to use the most recent Stata version unless they are tied to a legacy version of Stata or requirement for an old version for import elsewhere.

I would think that for Stata use, the actual coded value of missing (e.g. 9 or 99 or 999) is not as relevant as it is in EpiData, which provides users the link to their paper forms and study codebooks. However, where there are multiple missing values, it would be advantageous to maintain the distinction.

I propose:

EpiData —> Stata
system missing —> .
first declared missing value —> .a
next declared missing value —> .b
Since every Stata numeric data type allows for 27 missing values, there should be no issue with this. The actual internal values for ., .a, etc. will depend on the Stata numeric type.

When reading Stata data, I propose that EpiData treats all missing values as system missing (in Manager and Analysis). There is no clear logic to setting other values as declared missing, especially if the .dta has no value labels. I could imagine this being an option, in which .a, .b, … .z get assigned to their internal Stata values and the EpiData user will have to define the missing values accordingly.

I will submit a bug report on the problem with setting missing values on export.

> On Oct 23, 2017, at 14:58, EpiData development and support <epidata-list at lists.umanitoba.ca> wrote:
> Question :
> Stata uses one or more large integers as missing value and labels these as .a .b etc.
> Should we export missing values as the original value, e.g. 999 and loosing stata missing definition or use .a etc? 
> Jens Lauritsen
> Epidata.dk
> Den 23. oktober 2017 16.42.57 CEST, EpiData development and support <epidata-list at lists.umanitoba.ca> skrev:
>> Dear team members,
>> I had entered data in an .epx file created using EpiData Manager
>> v4.2.0.0
>> with jumps set for certain variables using the last defined and second
>> last
>> defined missing values. A unique identifier was also created for each
>> record by combining two integer fields.
>> On importing the data to stata for analysis, the following issues have
>> been
>> noted:
>> 1. The unique id field remains empty in the stata file
>> 2. Wherever, missing values were indicated in the .epx file, they
>> appear as
>> large random numbers in the stata file and not as the actual value that
>> was
>> entered. For example, age entered as 999 (missing) in the .epx file
>> shows
>> up as "32740" in the stata file. This happens only when the "missing"
>> box
>> is ticked while assigning value labels for categorical or continuous
>> variables.
>> This issue has been noted in the previous versions of epidata manager
>> as
>> well as while exporting to newer versions of stata and SPSS.
>> I have attached the sample .epx file and the .dta file for your
>> reference.
>> Thanks and Regards
>> Dr. Divya Nair
> _______________________________________________
> EpiData-list mailing list
> EpiData-list at lists.umanitoba.ca
> http://lists.umanitoba.ca/mailman/listinfo/epidata-list

More information about the EpiData-list mailing list