R, EpiInfo and opensource software
Hello EpiData Friends,
Im just writing a few lines to you at the end of my working day to share some toughts that I have recently.
In the last four years I invested a lot of effort and time tring to learn R due to my PhD object. Right now I feel confrtable with R, but in the biginning I felt like banging my head in many places to make stuff work. For the same reason I signed some R mail lists around to trade some doubts and solutions.
About three weeks ago, a EpiInfo developer posted a message in epidemiologic issues for R users and it seems that he wanted to go for a EPiInfo - R interface. I was curious and found out that the latest version of EpiInfo available at CDC is 2 years old. My guess is that it is too old for a software. Clearly it was not a oficial CDC contact.
Wondering why and how this could be done I realize that most likely R cand do every analysis EpiInfo can and much more. The disadvantageous are: R is is not friedly for the uninitiated, and has no default and complete GUI, although there are several GUIs available.
A EPI extesion could be developed in R, while there is already four of them, that I know of. But there some sntroger point to consider in developing stuff in R:
(1) opensource software is growing (2) R is opensource (3) R is multi OS (4) R language is reasonbly stable; thus something developed today may last for many many years indently of OS updates or Software updates (5) R has a core team that develop the software itself thus contributers may concentrate efforts in the extension codes and not in the software core, and OS integration. (6) There hundreds of maintainers and contributers thus it is a solid development that even has a early congress for users (7) It is possible, and there is many examples, to create a friendly graphical interface for epidemiologic users (8) There is already many functions directed to epidemiologic users such as epidemic curves, tables in stack form etc
Thus I would not be surprised if one day EpiData friends also start an integration with R, as already has been done for SAS, SPSS, WinBugs, SQL, MySQL etc etc and more recently it seems that epiinfo too.
Best regards to all,
Abraço forte e que a força esteja com você,
Dr. Pedro Emmanuel A. A. do Brasil Instituto de Pesquisa Clínica Evandro Chagas Fundação Oswaldo Cruz Rio de Janeiro - Brasil Av. Brasil 4365 Tel 55 21 3865-9648 email: pedro.brasil@ipec.fiocruz.br email: emmanuel.brasil@gmail.com
---Apoio aos softwares livres www.zotero.org - gerenciamento de referências bibliográficas. www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações. www.epidata.dk - entrada de dados. www.r-project.org - análise de dados. www.ubuntu.com - sistema operacional
I like these ideas. Thanks, Pedro Emmanuel. I'm also working with R on my working stuff, and also on a doctorate programme of Social Psychology. I started working with R since a Master Degree.
It would be a great idea to develop an interface between EpiData and R. But right now I'm just exporting the data of EpiData Entry to en Epi Info old format versions. Then I use the foreign package to upload the data to R.
I think that the data collected on Epi Info can be used directly with ODBC package of R.
On other hand, maybe the developer that talked with you is an interested of Epi Info 7, the newer version of Epi Info. Epi Info 7 is not stable but is open source, using mono technology.
I don't like Epi Info 7 because of political reasons with mono, who is the technology base of Epi Info 7. Mono is a platform that plans to use C# code like (Visual Studio of Microsoft). But the plans of C# within mono are not clear. For that, maybe a lot of critics of mono are right (e.g. http://www.ubuntumini.com/2009/06/get-your-microsoft-out-of-my-linux.html).
The other reasons that I don't like Epi Info 7 are technical. For example, mono is a memory hungry technology that makes Epi Info 7 very slowly and not pleasant for machines with poor hardware resources. A lot of work that I do are on very poor settings, so the testings of Epi Info 7 haven't work around them.
Instead of that, EpiData work perfectly on the context were I work. Is a very optimized and stable software. And, I hope to have a free and open source code for EpiData on the next year.
A hug from Dominican Republic,
2010/9/23 epidata-list@lists.umanitoba.ca:
Hello EpiData Friends,
Im just writing a few lines to you at the end of my working day to share some toughts that I have recently.
In the last four years I invested a lot of effort and time tring to learn R due to my PhD object. Right now I feel confrtable with R, but in the biginning I felt like banging my head in many places to make stuff work. For the same reason I signed some R mail lists around to trade some doubts and solutions.
About three weeks ago, a EpiInfo developer posted a message in epidemiologic issues for R users and it seems that he wanted to go for a EPiInfo - R interface. I was curious and found out that the latest version of EpiInfo available at CDC is 2 years old. My guess is that it is too old for a software. Clearly it was not a oficial CDC contact.
Wondering why and how this could be done I realize that most likely R cand do every analysis EpiInfo can and much more. The disadvantageous are: R is is not friedly for the uninitiated, and has no default and complete GUI, although there are several GUIs available.
A EPI extesion could be developed in R, while there is already four of them, that I know of. But there some sntroger point to consider in developing stuff in R:
(1) opensource software is growing (2) R is opensource (3) R is multi OS (4) R language is reasonbly stable; thus something developed today may last for many many years indently of OS updates or Software updates (5) R has a core team that develop the software itself thus contributers may concentrate efforts in the extension codes and not in the software core, and OS integration. (6) There hundreds of maintainers and contributers thus it is a solid development that even has a early congress for users (7) It is possible, and there is many examples, to create a friendly graphical interface for epidemiologic users (8) There is already many functions directed to epidemiologic users such as epidemic curves, tables in stack form etc
Thus I would not be surprised if one day EpiData friends also start an integration with R, as already has been done for SAS, SPSS, WinBugs, SQL, MySQL etc etc and more recently it seems that epiinfo too.
Best regards to all,
Abraço forte e que a força esteja com você,
Dr. Pedro Emmanuel A. A. do Brasil Instituto de Pesquisa Clínica Evandro Chagas Fundação Oswaldo Cruz Rio de Janeiro - Brasil Av. Brasil 4365 Tel 55 21 3865-9648 email: pedro.brasil@ipec.fiocruz.br email: emmanuel.brasil@gmail.com
---Apoio aos softwares livres www.zotero.org - gerenciamento de referências bibliográficas. www.broffice.org ou www.openoffice.org - textos, planilhas ou apresentações. www.epidata.dk - entrada de dados. www.r-project.org - análise de dados. www.ubuntu.com - sistema operacional _______________________________________________ EpiData-list mailing list EpiData-list@lists.umanitoba.ca http://lists.umanitoba.ca/mailman/listinfo/epidata-list
Dear colleagues
In response to the mails of last week regarding R and combination to EpiData software I would like to state the following:
The way I see EpiData software playing a role in the coming 10 years is like this:
a. Ensure full compliance with proper data documentation procedures and compliance with GCP guide lines (Good Clinical Practice) for medical studies, where certain requirements are imposed on the data management. These are explained in the paper: GCP-compliant data management in multinational clinical trials. ECRIN-2, Deliverable D10, Version 1, 15 September 2008.Ohmann C (chair) and the Transnational Working Group on Data Management. I participated in the workgroup. The paper can be downloaded from: http://www.ecrin.org/index.php?id=274
b. Fulfill data entry needs at the basic level in local levels giving full control over the process by the persons running projects or routine systems at that level
c. Perform analysis of quantitative data at the basic and extended levels within principles of univariate, bivariate levels and further levels in respect to stratified epidemiological analysis (M-H techniques), controlled analysis in survival (log rank and k-m plots) etc.
d. Bridge to other software for extending this analysis to current expected levels of regression or survival analysis, e.g. repeated measures models or logistic regression.
e. The software should be easy to use for all users. Beginners as well as support the needs for advanced usage based on more complex principles.
For point d I think an optimal strategy would be to bridge into R as the tool. (and to export documented data into commercial software such as user community supportive products such as Stata). Such that the steep learning curve for R is bypassed for standard analysis. This also allows for simplification of the programming for EpiData.
In other words - I see definetely - a path as suggested of a combination of R and EpiData. If someone have or could prepare a document of principles of calling batch procedures from other software into R please let us know on this list. We will need this sometime in spring next year, when other aspects of this are in place.
regards Jens Lauritsen EpiData Association
I can't speak for Jens and others directly involved in development, but I think it has always been pretty easy to get EpiData data files into R by exporting to Stata. Stata into R is easy.
This is an interesting posting by Pedro as I went through the same thinking about 10 years ago and talked about this with Mark Myatt. We had talked about putting a graphical interface onto R that would be as easy to use as EpiData, giving the power of R underneath. That's fine for analysis, and there are many analyses, especially modelling, that I would do in R by preference. However, I would never try to produce a food-specific attack rate table or even attempt to manage outbreak data in R.
I am also intrigued by the possibilities, as Pedro points out, once EpiData source is released and a means of plugging in other functions is available. That way, it should be relatively easy to have R do some difficult things in the background while the user can take advantage of the EpiData interface.
Jamie
On 2010-09-23, Pedro wrote:
Thus I would not be surprised if one day EpiData friends also start an integration with R, as already has been done for SAS, SPSS, WinBugs, SQL, MySQL etc etc and more recently it seems that epiinfo too.
The several R packages available for epidemiology are pretty handy:
epitools http://cran.r-project.org/web/packages/epitools/index.html epicalc http://cran.r-project.org/doc/contrib/Epicalc_Book.pdf epi
--Chris Ryan
epidata-list@lists.umanitoba.ca wrote:
. . .
. . . That's fine for analysis, and there are many analyses, especially modelling, that I would do in R by preference. However, I would never try to produce a food-specific attack rate table or even attempt to manage outbreak data in R.
I am also intrigued by the possibilities, as Pedro points out, once EpiData source is released and a means of plugging in other functions is available. That way, it should be relatively easy to have R do some difficult things in the background while the user can take advantage of the EpiData interface.
Jamie
On 2010-09-23, Pedro wrote:
participants (1)
-
epidata-list@lists.umanitoba.ca