Comparison of non-parametric methods for ungrouping coarsely aggregated data

Overview of attention for article published in BMC Medical Research Methodology, May 2016

Altmetric Badge

Citations

dimensions_citation: 5 Dimensions

Readers on

mendeley: 20 Mendeley

Summary Dimensions citations

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Comparison of non-parametric methods for ungrouping coarsely aggregated data
Published in	BMC Medical Research Methodology, May 2016
DOI	10.1186/s12874-016-0157-8
Pubmed ID	27216531
Authors	Silvia Rizzi, Mikael Thinggaard, Gerda Engholm, Niels Christensen, Tom Børge Johannesen, James W. Vaupel, Rune Lindahl-Jacobsen
Abstract	Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.

View on publisher site Alert me about new mentions

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 20 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	20	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	7	35%
Researcher	6	30%
Librarian	1	5%
Professor	1	5%
Other	1	5%
Other	2	10%
Unknown	2	10%

Readers by discipline	Count	As %
Social Sciences	5	25%
Mathematics	4	20%
Agricultural and Biological Sciences	3	15%
Medicine and Dentistry	3	15%
Engineering	2	10%
Other	1	5%
Unknown	2	10%