Title |
MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks
|
---|---|
Published in |
Bioinformatics, March 2016
|
DOI | 10.1093/bioinformatics/btw155 |
Pubmed ID | |
Authors |
Chao Pang, David van Enckevort, Mark de Haan, Fleur Kelpin, Jonathan Jetten, Dennis Hendriksen, Tommy de Boer, Bart Charbon, Erwin Winder, K Joeri van der Velde, Dany Doiron, Isabel Fortier, Hans Hillege, Morris A Swertz |
Abstract |
While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect CONTACT: : [email protected] information: Supplementary data are available at Bioinformatics online. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Italy | 1 | 33% |
Unknown | 2 | 67% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 2 | 67% |
Members of the public | 1 | 33% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 1 | 3% |
Brazil | 1 | 3% |
Unknown | 33 | 94% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 12 | 34% |
Student > Ph. D. Student | 12 | 34% |
Student > Bachelor | 3 | 9% |
Student > Master | 3 | 9% |
Student > Doctoral Student | 1 | 3% |
Other | 1 | 3% |
Unknown | 3 | 9% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 11 | 31% |
Computer Science | 8 | 23% |
Social Sciences | 3 | 9% |
Engineering | 3 | 9% |
Biochemistry, Genetics and Molecular Biology | 2 | 6% |
Other | 3 | 9% |
Unknown | 5 | 14% |