Re: New user trying to implement a check algoritm
Comment 1:
Hi Gustav, What an coincidence. Today we have been doing something similar with the Danish cpr-nr which is 10 digit 'ddmmyyzzz'. When we treated this as a numeric fiels with 10 digits our algorithms broke down when the date was after the 21. I think that the problem is that the largest integer in many pc languages (here Delphi/pascal) for an integer systems is The MaxInt constant gives the largest allowed value for an Integer. The value is normally (2^32)-1 = 2147483647
If you use a float you will get other problems. The solution for us was to store the complete personal number af a string variable and then extract the relevant number using substr and real.
Best wishes Claus
Seniorstatistician Claus Holst Institute of Preventive Medicine Center for Health and Society Copenhagen
The institute: http://www.ipm.regionh.dk EU projects: http://www.nugenob.org http://www.diogenes-eu.org/
Comment 2: In EpiData we recently changed default max length for integers to be 9 digits and to have the same default in as well analysis and entry. (Although for analysis if you use gen command the length would be 4)
The reason for the limitation of length of integer being smaller than the MaxInt constant mentioned by Claus Holst. This is due to the founding of the rec file format in Epi Info v6, which handled integers until the length of 4 as true integers, but stored larger integers internally as floats.
We can discuss if the length of 9 is sufficient. But it is my experience in all software that merging on large integers tends to give problems of precision. Therefore my suggested solution for Gustav's problem would be as Claus Holst also suggests to use a string variable.
In extension (but not related to the integer/string problem) the algoritm for the luhn algorithm
(http://en.wikipedia.org/wiki/Luhn_algorithm) could be implemented as a user written CHK command based on the procedures written for EpiData Entry. This is documented on the page www.epidata.dk/documentation.php with two examples: Soundex, Metaphone and Gumm algoritms.
regards
Jens Lauritsen EpiData Association
Hi all,
Many thanks for your help - the algorithm is now working fine. For the moment I haven't tried implementing it as a function, but I might in the future. I'm attaching the working version at the end, for reference to others that want to do the same thing. And to clarify, the luhn algorithm is calculated on the 10 digit version of the swedish personal identification number, without the leading two digits indicating century ("19","20").
I've got a supplementary questions as well: I've now redefined preQ2 to be a string variable, but the RANGE check still works. How is this possible?
again, thank you very much for your help!
best,
Gustaf
------------Working version of the luhn algoritm as applied to swedish PID's --------------------
preQ2
RANGE 199201010001 201801010001 DEFINE pnr1 # global DEFINE pnr2 # global DEFINE pnr3 # global DEFINE pnr4 # global DEFINE pnr5 # global DEFINE pnr6 # global DEFINE pnr7 # global DEFINE pnr8 # global DEFINE pnr9 # global DEFINE pnr10 # global DEFINE pnr11 # global DEFINE pnr12 # global DEFINE checkSum ## global pnr1=INTEGER(copy(preQ2,1,1)) pnr2=INTEGER(copy(preQ2,2,1)) pnr3=INTEGER(copy(preQ2,3,1)) pnr4=INTEGER(copy(preQ2,4,1)) pnr5=INTEGER(copy(preQ2,5,1)) pnr6=INTEGER(copy(preQ2,6,1)) pnr7=INTEGER(copy(preQ2,7,1)) pnr8=INTEGER(copy(preQ2,8,1)) pnr9=INTEGER(copy(preQ2,9,1)) pnr10=INTEGER(copy(preQ2,10,1)) pnr11=INTEGER(copy(preQ2,11,1)) pnr12=INTEGER(copy(preQ2,12,1))
IF (pnr3>4) THEN pnr3=pnr3*2-9 ELSE pnr3=pnr3*2 ENDIF IF (pnr5>4) THEN pnr5=pnr5*2-9 ELSE pnr5=pnr5*2 ENDIF
IF (pnr7>4) THEN pnr7=pnr7*2-9 ELSE pnr7=pnr7*2 ENDIF
IF (pnr9>4) THEN pnr9=pnr9*2-9 ELSE pnr9=pnr9*2 ENDIF
IF (pnr11>4) THEN pnr11=pnr11*2-9 ELSE pnr11=pnr11*2 ENDIF checkSum=pnr3+pnr4+pnr5+pnr6+pnr7+pnr8+pnr9+pnr10+pnr11 IF (checkSum=10*int(checkSum)) THEN checkSum=checkSum/10 ELSE checkSum=10*(TRUNC(checkSum/10)+1)-checkSum ENDIF IF (pnr12<>checkSum) THEN HELP "felaktigt PersonNummer" TYPE=ERROR GOTO preQ2 ENDIF
Range checks do work on string variables, since EpiData just compares the internal values. However, EpiData will not care whether the strings have non-numeric values within this range. For example, the following will pass the RANGE test:
1993ABCDEFGH
since this is 12 digits and internally any string starting with "1993" will lie between "1992xxxxxxxx" and "2018xxxxxxxx"
EpiData will detect an error and assign a missing value to PRNx variables when this happens. The default option is to NOT notify you of errors in check files at data entry, so you should also check that CHECKSUM <> . If only the last digit (the checksum value) is bad, CHECKSUM <> PRN12 will tell you there is an error
In my experience, the internal precision of EpiData will let you use 12-digit integers. The RANGE checks will then work fine and entry of non numbers is impossible.
So another way to guarantee entry in this range is
define preQ2A ____________ global
preQ2 Range 1991...... etc preQ2A = string(preQ2)
and work everything on preQ2A, since it is easy to pick apart strings using the COPY function.
Where you can get into trouble with long integers is when you start to do arithmetic with them.
Jamie
Gustaf wrote:
I've now redefined preQ2 to be a string variable, but the RANGE check still works. How is this possible?
preQ2 RANGE 199201010001 201801010001 END
participants (1)
-
epidata-list@lists.umanitoba.ca