↓ Skip to main content

Scaling bioinformatics applications on HPC

Overview of attention for article published in BMC Bioinformatics, December 2017
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
3 X users

Citations

dimensions_citation
9 Dimensions

Readers on

mendeley
26 Mendeley
citeulike
3 CiteULike
Title
Scaling bioinformatics applications on HPC
Published in
BMC Bioinformatics, December 2017
DOI 10.1186/s12859-017-1902-7
Pubmed ID
Authors

Mike Mikailov, Fu-Jyh Luo, Stuart Barkley, Lohit Valleru, Stephen Whitney, Zhichao Liu, Shraddha Thakkar, Weida Tong, Nicholas Petrick

Abstract

Recent breakthroughs in molecular biology and next generation sequencing technologies have led to the expenential growh of the sequence databases. Researchrs use BLAST for processing these sequences. However traditional software parallelization techniques (threads, message passing interface) applied in newer versios of BLAST are not adequate for processing these sequences in timely manner. A new method for array job parallelization has been developed which offers O(T) theoretical speed-up in comparison to multi-threading and MPI techniques. Here T is the number of array job tasks. (The number of CPUs that will be used to complete the job equals the product of T multiplied by the number of CPUs used by a single task.) The approach is based on segmentation of both input datasets to the BLAST process, combining partial solutions published earlier (Dhanker and Gupta, Int J Comput Sci Inf Technol_5:4818-4820, 2014), (Grant et al., Bioinformatics_18:765-766, 2002), (Mathog, Bioinformatics_19:1865-1866, 2003). It is accordingly referred to as a "dual segmentation" method. In order to implement the new method, the BLAST source code was modified to allow the researcher to pass to the program the number of records (effective number of sequences) in the original database. The team also developed methods to manage and consolidate the large number of partial results that get produced. Dual segmentation allows for massive parallelization, which lifts the scaling ceiling in exciting ways. BLAST jobs that hitherto failed or slogged inefficiently to completion now finish with speeds that characteristically reduce wallclock time from 27 days on 40 CPUs to a single day using 4104 tasks, each task utilizing eight CPUs and taking less than 7 minutes to complete. The massive increase in the number of tasks when running an analysis job with dual segmentation reduces the size, scope and execution time of each task. Besides significant speed of completion, additional benefits include fine-grained checkpointing and increased flexibility of job submission. "Trickling in" a swarm of individual small tasks tempers competition for CPU time in the shared HPC environment, and jobs submitted during quiet periods can complete in extraordinarily short time frames. The smaller task size also allows the use of older and less powerful hardware. The CDRH workhorse cluster was commissioned in 2010, yet its eight-core CPUs with only 24GB RAM work well in 2017 for these dual segmentation jobs. Finally, these techniques are excitingly friendly to budget conscious scientific research organizations where probabilistic algorithms such as BLAST might discourage attempts at greater certainty because single runs represent a major resource drain. If a job that used to take 24 days can now be completed in less than an hour or on a space available basis (which is the case at CDRH), repeated runs for more exhaustive analyses can be usefully contemplated.

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 26 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 26 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 9 35%
Student > Bachelor 3 12%
Student > Ph. D. Student 3 12%
Student > Postgraduate 2 8%
Professor > Associate Professor 2 8%
Other 2 8%
Unknown 5 19%
Readers by discipline Count As %
Agricultural and Biological Sciences 10 38%
Biochemistry, Genetics and Molecular Biology 4 15%
Engineering 3 12%
Computer Science 2 8%
Social Sciences 1 4%
Other 2 8%
Unknown 4 15%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 02 January 2018.
All research outputs
#15,924,934
of 25,200,621 outputs
Outputs from BMC Bioinformatics
#4,974
of 7,660 outputs
Outputs of similar age
#254,275
of 454,773 outputs
Outputs of similar age from BMC Bioinformatics
#78
of 141 outputs
Altmetric has tracked 25,200,621 research outputs across all sources so far. This one is in the 34th percentile – i.e., 34% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,660 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one is in the 31st percentile – i.e., 31% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 454,773 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 41st percentile – i.e., 41% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 141 others from the same source and published within six weeks on either side of this one. This one is in the 41st percentile – i.e., 41% of its contemporaries scored the same or lower than it.