↓ Skip to main content

An experimental study of the intrinsic stability of random forest variable importance measures

Overview of attention for article published in BMC Bioinformatics, February 2016
Altmetric Badge

Mentioned by

twitter
2 X users

Citations

dimensions_citation
118 Dimensions

Readers on

mendeley
87 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
An experimental study of the intrinsic stability of random forest variable importance measures
Published in
BMC Bioinformatics, February 2016
DOI 10.1186/s12859-016-0900-5
Pubmed ID
Authors

Huazhen Wang, Fan Yang, Zhiyuan Luo

Abstract

The stability of Variable Importance Measures (VIMs) based on random forest has recently received increased attention. Despite the extensive attention on traditional stability of data perturbations or parameter variations, few studies include influences coming from the intrinsic randomness in generating VIMs, i.e. bagging, randomization and permutation. To address these influences, in this paper we introduce a new concept of intrinsic stability of VIMs, which is defined as the self-consistence among feature rankings in repeated runs of VIMs without data perturbations and parameter variations. Two widely used VIMs, i.e., Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) are comprehensively investigated. The motivation of this study is two-fold. First, we empirically verify the prevalence of intrinsic stability of VIMs over many real-world datasets to highlight that the instability of VIMs does not originate exclusively from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. Second, through Spearman and Pearson tests we comprehensively investigate how different factors influence the intrinsic stability. The experiments are carried out on 19 benchmark datasets with diverse characteristics, including 10 high-dimensional and small-sample gene expression datasets. Experimental results demonstrate the prevalence of intrinsic stability of VIMs. Spearman and Pearson tests on the correlations between intrinsic stability and different factors show that #feature (number of features) and #sample (size of sample) have a coupling effect on the intrinsic stability. The synthetic indictor, #feature/#sample, shows both negative monotonic correlation and negative linear correlation with the intrinsic stability, while OOB accuracy has monotonic correlations with intrinsic stability. This indicates that high-dimensional, small-sample and high complexity datasets may suffer more from intrinsic instability of VIMs. Furthermore, with respect to parameter settings of random forest, a large number of trees is preferred. No significant correlations can be seen between intrinsic stability and other factors. Finally, the magnitude of intrinsic stability is always smaller than that of traditional stability. First, the prevalence of intrinsic stability of VIMs demonstrates that the instability of VIMs not only comes from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. This finding gives a better understanding of VIM stability, and may help reduce the instability of VIMs. Second, by investigating the potential factors of intrinsic stability, users would be more aware of the risks and hence more careful when using VIMs, especially on high-dimensional, small-sample and high complexity datasets.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 87 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Russia 1 1%
Brazil 1 1%
Unknown 85 98%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 21 24%
Student > Master 18 21%
Researcher 10 11%
Student > Doctoral Student 5 6%
Student > Bachelor 5 6%
Other 8 9%
Unknown 20 23%
Readers by discipline Count As %
Computer Science 13 15%
Biochemistry, Genetics and Molecular Biology 6 7%
Agricultural and Biological Sciences 6 7%
Engineering 5 6%
Earth and Planetary Sciences 5 6%
Other 27 31%
Unknown 25 29%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 03 February 2016.
All research outputs
#18,437,241
of 22,842,950 outputs
Outputs from BMC Bioinformatics
#6,321
of 7,289 outputs
Outputs of similar age
#287,378
of 397,089 outputs
Outputs of similar age from BMC Bioinformatics
#117
of 134 outputs
Altmetric has tracked 22,842,950 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,289 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 5th percentile – i.e., 5% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 397,089 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 15th percentile – i.e., 15% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 134 others from the same source and published within six weeks on either side of this one. This one is in the 4th percentile – i.e., 4% of its contemporaries scored the same or lower than it.