↓ Skip to main content

Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis

Overview of attention for article published in BMC Genomics, January 2013
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Above-average Attention Score compared to outputs of the same age and source (58th percentile)

Mentioned by

wikipedia
6 Wikipedia pages

Citations

dimensions_citation
29 Dimensions

Readers on

mendeley
49 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
Published in
BMC Genomics, January 2013
DOI 10.1186/1471-2164-14-s1-s14
Pubmed ID
Authors

Habil Zare, Gholamreza Haffari, Arvind Gupta, Ryan R Brinkman

Abstract

One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 49 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 2 4%
United Kingdom 1 2%
Unknown 46 94%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 12 24%
Researcher 8 16%
Student > Master 7 14%
Student > Bachelor 6 12%
Student > Doctoral Student 4 8%
Other 5 10%
Unknown 7 14%
Readers by discipline Count As %
Computer Science 12 24%
Biochemistry, Genetics and Molecular Biology 4 8%
Agricultural and Biological Sciences 4 8%
Mathematics 4 8%
Materials Science 3 6%
Other 12 24%
Unknown 10 20%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 January 2019.
All research outputs
#8,535,472
of 25,374,647 outputs
Outputs from BMC Genomics
#3,907
of 11,244 outputs
Outputs of similar age
#88,471
of 287,037 outputs
Outputs of similar age from BMC Genomics
#68
of 186 outputs
Altmetric has tracked 25,374,647 research outputs across all sources so far. This one is in the 43rd percentile – i.e., 43% of other outputs scored the same or lower than it.
So far Altmetric has tracked 11,244 research outputs from this source. They receive a mean Attention Score of 4.8. This one has gotten more attention than average, scoring higher than 58% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 287,037 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 44th percentile – i.e., 44% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 186 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 58% of its contemporaries.