[EpiData-list] Formatting and exporting data

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Sat Dec 3 04:45:21 CST 2005


The question of how to format data by Susanne Widmar is a good one.
Allow me to give a lengthy explanation since this is an issue in which we
should agree on a uniform principle as well for import, export and use of
data in EpiData Analysis.

This relates to two aspects:
1. Accuracy of data - data types (binary, integer, real, double...)
      - we wish to replicate the precision of the collected data
2. How many decimals are shown on output from commands
      - we wish to see as much as needed for the interpretation
      - often less than aspect 1.

In Stata the format is for some commands controlling the number of 
decimals,
such as the CI command, whereas for other commands the display format is
part of the command specification.

The same "confusion" is now in EpiData Analysis where there is only partial
consistency of showing and controlling formatting.

In my view output formatting of the output should be decided by the 
command not by the data.
In  v1.1 build 54+ of EpiData Analysis (now available for testing) I 
have implemented for
frequencies new options:
freq v1   /D0              // would give percentages with no decimals
freq v1   /D1              // would give percentages with one decimal

For cross tables there are the settings for tables:
TABLE PERCENT FORMAT COL    P1()
TABLE PERCENT FORMAT ROW    P1{}
TABLE PERCENT FORMAT TOTAL    P1[]
in which the user can define the number of decimals for the percentages

For other commands (describe, means..) the display format of commands is 
either
fixed or depending on the size of the estimates. User specification of 
output format
will be implemented later.

For import and export to Stata we could rewrite the procedures to work 
on datatype
rather than format as it is now, if this is possible. We  could then set 
all numerical variables to format %9.0g
Another strategy could be to allow EpiData Analysis to save data in 
Stata format
(which Bill Gould CEO of StataCorp has given permission to do).

I suggest that users comment on the uniform strategy:
a. All calculation is based on maximum accuracy with the data at hand
b. Output formation should be part of each command.
c. Data storage (type of variable) should be as strict as possible. 
Accomplishment of this is more
    important than speed of import and export.

Regards
Jens Lauritsen
EpiData Association




More information about the EpiData-list mailing list