↓ Skip to main content

A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments

Overview of attention for article published in BMC Research Notes, January 2017
Altmetric Badge

Mentioned by

twitter
1 tweeter

Citations

dimensions_citation
10 Dimensions

Readers on

mendeley
31 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments
Published in
BMC Research Notes, January 2017
DOI 10.1186/s13104-016-2355-1
Pubmed ID
Authors

Kelly K. Jones, Shannon N. Zenk, Elizabeth Tarlov, Lisa M. Powell, Stephen A. Matthews, Irina Horoi

Abstract

Food environment characterization in health studies often requires data on the location of food stores and restaurants. While commercial business lists are commonly used as data sources for such studies, current literature provides little guidance on how to use validation study results to make decisions on which commercial business list to use and how to maximize the accuracy of those lists. Using data from a retrospective cohort study [Weight And Veterans' Environments Study (WAVES)], we (a) explain how validity and bias information from existing validation studies (count accuracy, classification accuracy, locational accuracy, as well as potential bias by neighborhood racial/ethnic composition, economic characteristics, and urbanicity) were used to determine which commercial business listing to purchase for retail food outlet data and (b) describe the methods used to maximize the quality of the data and results of this approach. We developed data improvement methods based on existing validation studies. These methods included purchasing records from commercial business lists (InfoUSA and Dun and Bradstreet) based on store/restaurant names as well as standard industrial classification (SIC) codes, reclassifying records by store type, improving geographic accuracy of records, and deduplicating records. We examined the impact of these procedures on food outlet counts in US census tracts. After cleaning and deduplicating, our strategy resulted in a 17.5% reduction in the count of food stores that were valid from those purchased from InfoUSA and 5.6% reduction in valid counts of restaurants purchased from Dun and Bradstreet. Locational accuracy was improved for 7.5% of records by applying street addresses of subsequent years to records with post-office (PO) box addresses. In total, up to 83% of US census tracts annually experienced a change (either positive or negative) in the count of retail food outlets between the initial purchase and the final dataset. Our study provides a step-by-step approach to purchase and process business list data obtained from commercial vendors. The approach can be followed by studies of any size, including those with datasets too large to process each record by hand and will promote consistency in characterization of the retail food environment across studies.

Twitter Demographics

The data shown below were collected from the profile of 1 tweeter who shared this research output. Click here to find out more about how the information was compiled.

Mendeley readers

The data shown below were compiled from readership statistics for 31 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 31 100%

Demographic breakdown

Readers by professional status Count As %
Unspecified 6 19%
Student > Master 6 19%
Researcher 5 16%
Other 4 13%
Professor > Associate Professor 2 6%
Other 8 26%
Readers by discipline Count As %
Unspecified 8 26%
Nursing and Health Professions 8 26%
Social Sciences 5 16%
Computer Science 2 6%
Agricultural and Biological Sciences 2 6%
Other 6 19%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 15 May 2017.
All research outputs
#7,905,077
of 10,152,716 outputs
Outputs from BMC Research Notes
#1,690
of 2,429 outputs
Outputs of similar age
#188,649
of 264,310 outputs
Outputs of similar age from BMC Research Notes
#31
of 36 outputs
Altmetric has tracked 10,152,716 research outputs across all sources so far. This one is in the 12th percentile – i.e., 12% of other outputs scored the same or lower than it.
So far Altmetric has tracked 2,429 research outputs from this source. They receive a mean Attention Score of 4.6. This one is in the 17th percentile – i.e., 17% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 264,310 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 16th percentile – i.e., 16% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 36 others from the same source and published within six weeks on either side of this one. This one is in the 11th percentile – i.e., 11% of its contemporaries scored the same or lower than it.