We have developed parallel algorithms and compressed data structures to address several computational challenges of NGS assembly. We demonstrate how commonly available multicore architectures can be efficiently utilized for sequence assembly. In all stages (indexing input strings, string graph construction and simplification, extraction of contiguous subsequences) of our software PASQUAL, we use shared-memory parallelism to speed up the assembly process. In our experiments with data of up to 6.8 billion base pairs, we demonstrate that PASQUAL generally delivers the best trade-off between speed, memory consumption, and solution quality. On synthetic and real data sets PASQUAL scales well on our test machine with 40 CPU cores with increasing number of threads. Given enough cores, PASQUAL is fastest in our comparison.
PASQUAL has been developed at the HPC lab of Georgia Institute of Technology. Henning Meyerhenke is a project member and contributed to the development during his time at the lab. Currently ports of PASQUAL to other platforms are being developed by external groups. More details and the source code can be found at the main project website.