HSRA Homepage

HSRA

[NEW] 2019/01/23: HSRA v1.1 released! Check out the News section

Hadoop Spliced Read Aligner (HSRA) [1,2] is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments that supports single-end and paired-end read alignments from FASTQ/FASTA datasets. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step in bioinformatics pipelines. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a computer cluster by combining a fast spliced aligner with a Big Data processing framework.

More specifically, HSRA takes advantage of the MapReduce programming model originally developed by Google [3] to extend the multithreading capabilities of the spliced HISAT2 aligner [4,5] to large scale distributed systems (e.g., cloud-based infrastructures). HSRA is built upon the open-source Apache Hadoop project [6], which is the most popular distributed computing framework for scalable Big Data processing, and currently supports all major 64-bit Linux distributions. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library [7] to efficiently read the input datasets stored in the Hadoop Distributed File System (HDFS) [8], being able to process datasets compressed with Gzip and BZip2 codecs.

This tool is distributed as free software and is publicly available at the Downloads section under the GPLv3 license [9].

Citation

If you have used HSRA in your research, please cite our work using the following reference:

[1] Roberto R. Expósito, Jorge González-Domínguez, Juan Touriño. HSRA: Hadoop-based spliced read aligner for RNA sequencing data. PLoS ONE 13(7): e0201483 (2018)

References

[9] GNU General Public License version 3 (GPLv3)