A new computationally efficient algorithm for record linkage with field dependency and missing data imputation

Overview of attention for article published in International Journal of Medical Informatics, November 2017

Altmetric Badge

Citations

dimensions_citation: 7 Dimensions

Readers on

mendeley: 30 Mendeley

Summary Dimensions citations

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	A new computationally efficient algorithm for record linkage with field dependency and missing data imputation
Published in	International Journal of Medical Informatics, November 2017
DOI	10.1016/j.ijmedinf.2017.10.021
Pubmed ID	29195708
Authors	John Ferguson, Ailish Hannigan, Austin Stack
Abstract	Record linkage algorithms aim to identify pairs of records that correspond to the same individual from two or more datasets. In general, fields that are common to both datasets are compared to determine which record-pairs to link. The classic model for probabilistic linkage was proposed by Fellegi and Sunter and assumes that individual fields common to both datasets are completely observed, and that the field agreement indicators are conditionally independent within the subsets of record pairs corresponding to the same and differing individuals. Herein, we propose a novel record linkage algorithm that is independent of these two baseline assumptions. We demonstrate improved performance of the algorithm in the presence of missing data and correlation patterns between the agreement indicators. The algorithm is computationally efficient and can be used to link large databases consisting of millions of record pairs. An R-package, corlink, has been developed to implement the new algorithm and can be downloaded from the CRAN repository.

View on publisher site Alert me about new mentions

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 30 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	30	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	9	30%
Student > Ph. D. Student	6	20%
Student > Master	3	10%
Student > Bachelor	2	7%
Lecturer	1	3%
Other	3	10%
Unknown	6	20%

Readers by discipline	Count	As %
Computer Science	10	33%
Medicine and Dentistry	4	13%
Biochemistry, Genetics and Molecular Biology	2	7%
Nursing and Health Professions	2	7%
Engineering	2	7%
Other	2	7%
Unknown	8	27%