Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research

2015

European Journal of Human Genetics 2015 10.1038/ejhg.2015.165

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to

be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able

to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an

integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon

vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies

where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research.

The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least

39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools)

consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample

avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable

information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this

categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that

harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological

advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.

Nature Publishing Group

European Journal of Human Genetics

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 3.0 Norge