HSRA: Hadoop Spliced Read Aligner GPLv3 logo


[NEW] 2018/06/19: HSRA v1.0.1 released! Check out the News section

Hadoop Spliced Read Aligner (HSRA) [1] is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments that supports single-end and paired-end read alignments from FASTQ/FASTA datasets. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step in bioinformatics pipelines. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a computer cluster by combining a fast spliced aligner with a Big Data processing framework.

More specifically, HSRA takes advantage of the MapReduce programming model originally developed by Google [2] to extend the multithreading capabilities of the spliced HISAT2 aligner [3,4] to large scale distributed systems (e.g., cloud-based infrastructures). HSRA is built upon the open-source Apache Hadoop project [5], which is the most popular distributed computing framework for scalable Big Data processing, and currently supports all major 64-bit Linux distributions. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library [6] to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS) [7], being able to process datasets compressed with Gzip and BZip2 codecs.

This tool is distributed as free software and is publicly available at the Downloads section under the GPLv3 license [8].