[EpiData-list] READ statement

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Wed Mar 7 03:45:41 CST 2007


When READing a dataset in Analysis, the program checks all dates in
the dataset for 'valid' dates.  For example, on reading a dataset
DATA.REC, the following message is displayed:

. Read
Loading data C:\TEST\DATA.REC, please wait..
invalid year 1000, month 1, day 17
Error field DLOPV record: 1401
invalid year 5001, month 6, day 18
invalid year 200, month 1, day 16
invalid year 199, month 8, day 8
invalid year 200, month 2, day 2
invalid year 2200, month 12, day 3
invalid year 202, month 12, day 10
invalid year 202, month 12, day 10
invalid year 200, month 12, day 4
invalid year 200, month 12, day 4
invalid year 202, month 12, day 19
invalid year 603, month 5, day 24
invalid year 603, month 6, day 24
invalid year 199, month 1, day 15
invalid year 203, month 8, day 23
invalid year 203, month 8, day 23
invalid year 993, month 1, day 1
invalid year 199, month 9, day 19
invalid year 802, month 1, day 17
invalid year 199, month 4, day 18
invalid year 996, month 7, day 17
invalid year 206, month 7, day 31
invalid year 200, month 5, day 11
Errors reading data
File name : C:\TEST\DATA.REC
Fields: 222 Total records: 10460 Valid records: 10450
Excluded records: 10

I have several concerns about this approach.
1.  Most important, records with 'invalid' dates are excluded from
analysis (10 in this example).  The fact that the 'invalid' dates are
likely to be incorrect is no reason to not be able to analyse the
records.
2.  READing a large dataset with many dates can be a very slow process.
3.  The output given by Analysis (as shown above) is not very useful.
Neither record numbers nor variable names are identified.

I would like to suggest:
a.  That dates are not checked at all during the READ process.  The
analyst can look for incorrect dates in Analysis, relevant to the
content of the data.
OR
b.  That the user can specify whether to check the dates or not on
READing a dataset.
OR
c.  That, if neither a. nor b. is an option, more detail about the
'invalid' dates (record number, variable name) is provided so that
they can be edited.

Annemieke van Middelkoop
Pretoria, South Africa


More information about the EpiData-list mailing list