PyRanges: efficient comparison of genomic intervals in Python
Journal article, Peer reviewed
Accepted version
Åpne
Permanent lenke
http://hdl.handle.net/11250/2640448Utgivelsesdato
2019Metadata
Vis full innførselSamlinger
Originalversjon
10.1093/bioinformatics/btz615Sammendrag
Complex genomic analyses often use sequences of simple set operations like intersection, overlap and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single threaded on binary set operations, PyRanges is in median 2.3–9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python.