creating an identifiant from various variables + double data entry
Dear Jens,
Thanks for your helpful comment on creating an identifiant from various variables and double data entry (see below).
I just wanted to clarify a few things:
1) Does this mean that when using the 'Double Entry Verification' option (immediate comparison), this will not work if you have added any new variables into the .rec, or changed the order of the variables from the existing .rec?
2) If you use the other method of data verification (entering data into two seperate files and then validating duplicate files afterwards), is it possible to compare the old and new .rec files, even if a new variable has been added, or the order of the variables has been changed?
Will using the 'validate duplicate files' option work if the new .rec file has the variables in a different order?
Thanks so much for all your help,
Nicki Bailey
One aspect of this about unique identifiers is when users want to do double entry.
The only double entry option in this situation is to enter the data twice and then compare afterwards, since the "immediate comparison" double entry mode is not available for combined field unique entries unless the records are entered in EXACTLY the same sequence as the first time. The problem for the user is, that the system is NOT giving correct warnings in the immediate double entry if the unique index is created in contrast to entered directly.
I would like the identifier to be automatically created in the .rec file after entering 4 variables and as an aggregation of these 4 variables. For instance my identifier is ID=region number+village number+household number+member in the household number
regards Jens Lauritsen EpiData Association
Q: 1) Does this mean that when using the 'Double Entry Verification' option (immediate comparison), this will not work if you have added any new variables into the .rec, or changed the order of the variables from the existing .rec? Q 2) With "compare two files" is it possible to compare the old and new .rec files, even if a new variable has been added, or the order of the variables has been changed?
Since I am not sure of the answer the best way to get rid of such a worry is to make a small test.
1 Just create a rec file of say 5 fields and enter 5 records - let us name it A. Add checks and a unique entry field. 2 Then copy structure (tools menu) of the file to a new file B 3 Use with file b the "rec to qes" file option on file B and create b.qes 4 Open b.qes and change the order of some variables (let the ID variable be at the top) and add a few more. 5. Open file b for entry and the entry module will ask "update structure". Say yes and you now have a new field sequence and number. 6. enter a few records more in file b and see what happens with the "validate dublicate files". You know there is a changed structure and more fields in file b, but also that the contents of the first five records is the same as file a (for fields in a and b).
7. Do simililarly as indicated above, that is "prepare double entry " for the "immediate option" to say a_db. 8. Create from a_dbl the file a_dbl.qes (as point 3 above). 9. change in the qes file the order of the fields (except id). 10. enter data in file a_dbl and see if it works.
Finally: Let us know on the list how your test went.
Morale: never trust any software unless you have tried and challenged the software yourself. (I do that for all types of software - regardless of status: commercial, shareware, freeware ....)
regards Jens Lauritsen EpiData Association With the t
But in any case my worry would mean
- If you use the other method of data verification (entering data into
two seperate files and then validating duplicate files afterwards), is it possible to compare the old and new .rec files, even if a new variable has been added, or the order of the variables has been changed?
Will using the 'validate duplicate files' option work if the new .rec file has the variables in a different order?
Thanks so much for all your help,
Nicki Bailey
One aspect of this about unique identifiers is when users want to do double entry.
The only double entry option in this situation is to enter the data twice and then compare afterwards, since the "immediate comparison" double entry mode is not available for combined field unique entries unless the records are entered in EXACTLY the same sequence as the first time. The problem for the user is, that the system is NOT giving correct warnings in the immediate double entry if the unique index is created in contrast to entered directly.
I would like the identifier to be automatically created in the .rec file after entering 4 variables and as an aggregation of these 4 variables. For instance my identifier is ID=region number+village number+household number+member in the household number
regards Jens Lauritsen EpiData Association
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
Technical Message/developer oriented message.
As announced earlier the development time available is focused on moving towards the complete rewriting of common core modules and principles.
This is for most users only relevant in terms of long term maintenance and stability. But as such does not change principles of working or immediate design of the visual parts of the software - be it Analysis or Entry.
The new core modules will be implemented over the next year. First in EpiC, then in Analysis and finally in Entry. The latter combined with developing a focused data entry module accomplishing the regulations of good clinical practice guidelines and FDA part 11 compliance.
Also the translation principles will change since the conversion to the new core requires combination of the translated parts of Entry and Analysis. More news on translation will follow within the next month.
The core modules reads data from disk, saves data to disk and handles aspects such as conversion of dates, changes from integer to string variables etc.
Obviously thorough testing of this is very important. The real batch testing will be implemented as part of the EpiC conversion to the new core, but technical oriented users are encouraged to also test the new module in reading and writing data files. Currently reading and writing can be done on any file in format EpiData (rec+chk) and any Stata version from 4-10. Dbase, csv and possibly more formats follows.
The first release of a test application is now available for windows and linux from the testing page: http://www.epidata.dk/testing.php Please notice that it is purely a test application which should NOT be used for routine data. Work on copies of data.
Please discuss problems and success on the epidata list. Errors should be reported with attached example files (no sensitive data) to the new bug reporting system which will replace the Mantis system. The current address for the flyspray system is : http://www.epidata.info/flyspray/ but this might change.
regards Jens Lauritsen EpiData Association
I tried the core module on some real data with lots of dates and missing values. Same results under Windows and SUSE linux.
Read/write of EpiData .rec/.chk are fine
Write of Stata (I used v7) is fine - dates and missing values come through OK; missing text values show up as ".." Data as viewed by Stata v7 are perfect.
Read back of Stata has problems. Dates are not recognized as dates, so just the serial numbers come through as numbers; missing text values appear as "..", which would be easy enough to identify. The original text fields were a single character, though.
Write of .dbf was allowed by the version available on July 8, but only wrote the header and a few fields on the first record. Read back of .dbf produced field names and junk (when read by Excel, the .dbf had only the headers)
It is encouraging to see identical results for Windows and Linux versions!
Jamie
Jens wrote:
The new core modules will be implemented over the next year. First in EpiC, then in Analysis and finally in Entry.
The first release of a test application is now available for windows and linux from the testing page:
Please discuss problems and success on the epidata list. Errors should be reported with attached example files (no sensitive data) to the new bug reporting system which will replace the Mantis system.
Dear Jamie.
Thank for the great feedback, it's always very valuable to get this kind of information when developing a completely new system.
Is it somehow posible to get hold of the data you have created both with the original programs and those converted by using the cores system. It will greatly help the debugging process, especially if you can provide some sort of "result" list, i.e. the name of the fields, types, number of records, special records, etc.
Kind regards, Torsten Bonde Christiansen. Software Developer, EpiData.
epidata-list@lists.umanitoba.ca wrote:
I tried the core module on some real data with lots of dates and missing values. Same results under Windows and SUSE linux.
Read/write of EpiData .rec/.chk are fine
Write of Stata (I used v7) is fine - dates and missing values come through OK; missing text values show up as ".." Data as viewed by Stata v7 are perfect.
Read back of Stata has problems. Dates are not recognized as dates, so just the serial numbers come through as numbers; missing text values appear as "..", which would be easy enough to identify. The original text fields were a single character, though.
Write of .dbf was allowed by the version available on July 8, but only wrote the header and a few fields on the first record. Read back of .dbf produced field names and junk (when read by Excel, the .dbf had only the headers)
It is encouraging to see identical results for Windows and Linux versions!
Jamie
Jens wrote:
The new core modules will be implemented over the next year. First in EpiC, then in Analysis and finally in Entry.
The first release of a test application is now available for windows and linux from the testing page:
Please discuss problems and success on the epidata list. Errors should be reported with attached example files (no sensitive data) to the new bug reporting system which will replace the Mantis system.
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
I tried a simple external lookup table, all files at:
http://geddes.home.netcom.com/test_2.zip
with check file:
isn COMMENT LEGAL surname.rec SHOW TYPE COMMENT vsn END
vsn NOENTER END
ign COMMENT LEGAL givenname.rec SHOW TYPE COMMENT vgn END
VGN NOENTER END
was warned data may be lost if I continued, which I did with no loss, but could do nothing more except look at data. I did see both labels and data, which didn't view in regular EpiData (that is in Document/View Data, the names showed up in both data and label).
(The names came from
# ... names from various sources # Don Olivier, Harvard School of Public Health don@hsph.harvard.edu
found at ftp://ftp.fu-berlin.de//pub/unix/security/dictionaries/names/
)
Pete Geddes
Hi Torsten I've put an entry into the Documentation database Jamie
epidata-list@lists.umanitoba.ca wrote:
Dear Jamie.
Thank for the great feedback, it's always very valuable to get this kind of information when developing a completely new system.
Is it somehow posible to get hold of the data you have created both with the original programs and those converted by using the cores system. It will greatly help the debugging process, especially if you can provide some sort of "result" list, i.e. the name of the fields, types, number of records, special records, etc.
Kind regards, Torsten Bonde Christiansen. Software Developer, EpiData.
participants (1)
-
epidata-list@lists.umanitoba.ca