[EpiData-list] follow up: big data sets

epidata-list at lists.umanitoba.ca epidata-list at lists.umanitoba.ca
Wed Aug 3 10:57:48 CDT 2011

I tried the new Entryclient with two large data sets.

Set 1: 500 integer fields, 500 records, uncompressed size: 1.1 MB as .epx  45K as .epz
- load the file: 150 seconds
- jump from record 1 to last record: 1 or 2 seconds
- * to get empty record: 5 seconds
- save file: 3 second
- load in Analysis (running under Wine, so a bit slower than on an equally powered PC): 5 seconds
- freq v500: immediate
- no difference with .epz file

Set 2: 20 integer fields, 10,560 records, uncompressed size: 1.1 MB
- load the file: immediate
- jump to end or new record: immediate
- load in Analysis: 7 seconds
- freq v1: immediate

I expect the typical "large" data set would be somewhere between these two. Most large data sets I have used are like Set 2 and have been extracts of national databases. I might use the Entryclient on these, but unlikely. I've seen others using data like Set 1, but never really understood why they used such large data sets as most data items were unused and interesting only in a clinical review of individual subjects.

(technical note: time to execute a Javascript loadxml() <500 msec)


More information about the EpiData-list mailing list