[EpiData-list] Missing data values when exporting to Stata
EpiData development and support
epidata-list at lists.umanitoba.ca
Wed Aug 21 07:04:31 CDT 2019
I have some missing data that is encoded as ‘9’ in the database and in the value label list it is flagged as missing. When exporting to CSV the value is obviously set as ‘9’ as expected.
When I export it as Stata I assume this value ‘9’ gets encoded to Stata’ definition of a missing value, but when I read the data in Python with the ‘pd.read_stata’-function it defines the category value ‘9’ as 100.0 and not as missing data (NaN) like the blanks in EpiData.
One column has 80 values. 19 of these are encoded as ‘9’ (missing), 58 of these are blank, and 3 are coded as ‘1’ (Yes) in EpiData. After exporting to Stata and reading in Python I get
19 values in category ‘100’
3 values in category ‘Yes’
58 values NaN (i.e. missing)
Why aren’t the ‘9’ category correctly coded as missing when exporting to Stata, or am I missing something here?
More information about the EpiData-list