PyRanges: efficient comparison of genomic intervals in Python
Journal article, Peer reviewed
MetadataShow full item record
Complex genomic analyses often use sequences of simple set operations like intersection, overlap and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single threaded on binary set operations, PyRanges is in median 2.3–9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python.