How to: checking for duplicate records
Some of the line breaks in the example did not come through properly. This is how I ran it: *********** Read "C:\Program Files\EpiData\validate\estimates\bromar.rec" gen s(25) k = "s:" + trim(string(sex) ) k = trim(k) + "a:" + trim(string(age) ) k = trim(k) + "d:" + trim(string(dectime) ) sort k gen x=1 aggregate k /sum=x /close /notable freq n *********** It works fine. I think you were not executing the 'aggregate' statement. With /close, the 'aggregate' command closes your working file and creates a new one with the following fields: k, n, nx, sumx For this example, n, nx and sumx are the same values.
Jamie Hockin Public Health Agency of Canada
Dear Jamie,
Indeed some of the line breaks threw it a bit. I reformatted below - but still get that strange TRUNC error like: . list if k = (( k[_n-1]) or (k[_n+1])) Operator TRUNC is incompatible with String Operation aborted -------------- Read "C:\Program Files\EpiData\testdata\bromar.rec" * Assume you wish to see variables with same value of id sex age and dectime * I prefer the "gen" to the "define k _____ " but this is not important. gen s(80) k = "id: " + string(id) k = trim(k) + "s:" + trim(string(sex) ) * the first "trim(k) is important !!!. k = trim(k) + "a:" + trim(string(age) ) k = trim(k) + "d:" + trim(string(dectime) ) sort k * list now any two records where two consecutive records have the same value * notice [ _n-1] indicates previous record [ _n+1] next record * notice also the many parenthesis. They could be removed but makes sure * of what is what. list if k = (( k[_n-1]) or (k[_n+1])) * An alternative would be to create a frequency table, but with many records * (here 4000) * this is not going to work. So we use instead. define x # x = 1 * now count how many times a given value in k was there * do not show the table (takes too long) therefore /notable. aggregate k /sum=x /close /notable list n k if n > 1
But I would like this running as it is exactly what I wanted to have listed - how is your solution different, how do I interpret your table? Tx Max
epidata-list@lists.umanitoba.ca wrote:
Some of the line breaks in the example did not come through properly. This is how I ran it:
Read "C:\Program Files\EpiData\validate\estimates\bromar.rec" gen s(25) k = "s:" + trim(string(sex) ) k = trim(k) + "a:" + trim(string(age) ) k = trim(k) + "d:" + trim(string(dectime) ) sort k gen x=1 aggregate k /sum=x /close /notable freq n
It works fine. I think you were not executing the 'aggregate' statement. With /close, the 'aggregate' command closes your working file and creates a new one with the following fields: k, n, nx, sumx For this example, n, nx and sumx are the same values.
Jamie Hockin Public Health Agency of Canada
EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
participants (1)
-
epidata-list@lists.umanitoba.ca