[EpiData-list] Validation duplicate files

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Fri Feb 2 13:38:12 CST 2007

Hi Charity and Lars:

There are two ways for duplicate data entry:

1) independent of sequence
2) dependent on sequence

1) If you use a field with unique identifier, you can enter the data 
independently of sequence twice, and then compare the two files later on 
that unique identifier.  In this instance two data entry groups can 
enter add the same time part A and part B and then switch for second entry.

2) You can use the "Tools" "Prepare Double Entry Verification".  If you 
do not tick and choose the option "Match records by field" (and choose 
the Unique identifier), then the second entry will by default follow the 
record number, and thus sequence matters.  If you choose that option, 
the first data entry group will still have to await completion by the 
first data entry group.

As Charity says, it is critical in any case to have a truly unique 
identifier which can by assured by KEY UNIQUE.  Seemingly, Lars, you did 
not have that.  You must thus identify the records with duplicate 
identifiers and or missing identifiers in both files, mark them for 
deletion (but not packing" and then compare.  You deal then separately 
with the problematic records in both files, most simply perhaps by 
reentering them properly and independently in both files.



Hello Lars,

I just did a small test of this to make sure, and the ordering/sequence of
IDs in the two files does not make a difference when Validating Duplicate

What will cause a problem is if there is a duplicated ID in one of your two
files - are you sure that the Case IDs are unique? You should have "KEY
UNIQUE 1" included in your .chk file for the ID variable to ensure this.

Kind regards,

On 2/1/07, epidata-list at lists.umanitoba.ca <epidata-list at lists.umanitoba.ca>

Dear Epidata_list,

I'm a new member of this list and a fairly new user of the Epidata
In attempting to validate duplicate files not entered in same sequence
by two observers I have difficulty in performing the test under
documebt/validate duplicate files. Attempting to perfom the test with a
selected key field of case-id an error is given; "Duplicate keys found"
and the validation is terminated.
Is it possible to perform a comparison of the two datasets despite the
fact that the registrations are not performed in same sequence with
regard to registration of the observations?


Hans L Rieder, MD, MPH
Jetzikofenstr. 12
3038 Kirchlindach

Tel: +41 31 829 4577
Mob: +41 79 321 9122
Web: http://www.tbrieder.org

More information about the EpiData-list mailing list