Design and analysis of an H.265 entropy encoder
Abstract
The Context-Adaptive Binary Arithmetic Coding(CABAC) used in High Efficiency Video Coding(HEVC/H.265) is a near optimal entropy coding method. As a consequence of this coding efficiency, CABAC implementation is a complicated and highly serialized algorithm. With the CABAC becoming a bottleneck in encoder and decoder performance, a major innovation has taken place in the binarization scheme of the transform-coefficient level values. HEVC introduces an adaptive binarization scheme that allows more data to be encoded using a high throughput bypass mode. This adaptive binarization scheme utilizes three different coding methods, Truncated Unary(TrU), k-th order Truncated Rice(TRk) and k-th order Exp-Golomb(EGk). By exploiting the properties of the video-coding data structure, as well as the properties each of these coding methods hold, this binarization scheme is able to achieve a near optimal code.
Thorough analysis of the binarization scheme has been performed, with a main focus on finding an efficient hardware implementation. A major challenge was finding an efficient way of coding the remaining absolute transform-coefficient level(ALRem). ALRem is coded using an truncation of TRk and EGk, with an adaptive level(k). A finite state machine approach was found, that proved to be a very efficient at coding the absolute remaining level. This approach was implemented in hardware.
The Context Index Calculator, that form an integral part of the HEVC CABAC system was not implemented. When this module is designed, it is proposed to combine the Binarizer and Context Index Calculator. This is due to the large amount of shared data dependencies.
A simplified version of an actual Context-Adaptive Binary Arithmetic Coding encoder architecture is implemented. It performs CABAC encoding as specified in by the HEVC standard, but is limited to the encoding of a subset of the transform-coefficient level data. Verifying the correctness of this hardware encoder required the development of a software model. This software encoder was expanded to also include a decoder, which allowed for additional functional verification.
Because of the inconsistent throughput of the encoder modules, an asynchronous fifo was developed to simplify data flow, and improve performance. Due to the unfinished state of both the binarizer and context index calculator, the completed system was not implemented.