Feedback-based Error Control Methods for H.264

Many network-based multimedia applications transmit real-time media over unreliable networks, i.e. data may be lost or corrupted on its route from sender to receiver. Such errors may cause a severe degradation in perceptual quality. It is important to apply techniques that improve the robustness against errors, in order to ensure that the receiver is able to playback the media with the best attainable quality. Today, most ER schemes for video employ proactive error resilient encoding. These schemes add redundant information into the encoded video stream in order to increase the robustness against potential errors. Because of this, most proactive schemes suffer from a significant reduction of the coding efficiency. Another approach is to adjust the encoder operations based on feedback information from the decoder, e.g. to repair corrupted regions based on reports of lost data. Feedback-based ER schemes normally improves the coding efficiency compared with proactive schemes. Moreover, they adjust rapidly to time-varying network conditions. The objective of this thesis is to develop and evaluate a feedback-based ER scheme conforming to the H.264/AVC standard and applicable for real-time low-delay video applications. The scheme is referred to as FBIR. The performance of FBIR will be compared with an existing proactive ER scheme, known as IPLR. Special attention is given to the applied feedback mechanism, RTP/AVPF. RTP/AVPF is a new (2006) feedback protocol. Basically, it specifies two modifications/additions to the RTCP: First, it modifies the timing algorithm to enable early feedback, while not exceeding the RTCP bandwidth constraint. Second, new RTCP message types are defined, which provides information useful for error control purposes. FBIR employs RTP/AVPF to provide timely feedback of lost packets from the decoder to the encoder. Upon reception of this feedback, the encoder use a fast error tracking algorithm to locate the erroneous regions. Finally, the regions that are assumed to be visually corrupted after decoding are intra refreshed. IPLR is an ER scheme developed for use in a commercial video communication system. It applies a motion-based intra refresh routine. The comparison is carried out by online simulations with various network environments (0, 1, 3 and 5% loss rate; 50 and 200 ms latency), bit rates (64, 144 and 384 kbit/s) and video sequences. First, the video is encoded and transmitted in real-time to the decoder via a network emulator. This emulator generates the desired network characteristics. The receiver decodes the video in real-time and transmits feedback information back to the encoder. The encoder adjusts its encoding process according to this feedback. The H.264/AVC reference software is modified and used as codec. Finally, objective quality measures are obtained by calculating the PSNR of the decoded videos. In addition, some visual inspection is performed. Isolated measures on the RTP/AVPF transmission algorithm are also performed. These show that RTP/AVPF is able to provide timely feedback for error control purposes for a great number of applications and network environments. However, the experienced feedback delay may be increased by numerous factors, e.g. the network latency, the packet loss rate, the session bandwidth, and the number of receivers. This may decrease the performance of ER schemes utilizing RTP/AVPF. RTP/AVPF is fairly easy to implement since it only modifies the RTCP timing algorithm and adds new RTCP message types. RTP/AVPF may be used in combination with other standards in order to extend the available feedback information. Hence, RTP/AVPF enables timely feedback for use in a wide range of multimedia applications. The PSNR measurements show that FBIR always obtains higher objective quality than IPLR for error free transmissions. This does not, however, necessarily affect the perceptual quality if the bit rate is high. FBIR achieves higher PSNR in other situations as well, such as for very low loss rates, low or medium bit rates, and for sequences with high or medium motion activity. Conversely, IPLR performs better for low motion sequences encoded at high bit rates when the loss rate exceeds a certain threshold, typically about 1%. It is also shown that the performance of FBIR may be reduced if the network latency increases. Visually, the main difference between the two schemes is that FBIR recovers all corrupted regions at one instant, while IPLR performs a gradual refresh. The average time before recovery is somewhat shorter for IPLR. The differences between FBIR and IPLR are mainly caused by two factors. First, using FBIR results in less intra coding and thus better coding efficiency. Second, the FBIR scheme does not repair errors until the encoder receives the feedback. Usually, this happens after IPLR has repaired most of the corrupted region. In short, one can say that FBIR provides medium error robustness and high coding efficiency, in contrast to IPLR's high robustness and low coding efficiency. While FBIR's performance may be reduced by network characteristics such as increased latency, IPLR is unaffected by these factors. For error free transmissions, FBIR does not significantly reduce the coding gain compared with a non-robust encoding scheme. Still, it provides a good robustness against corruption in error-prone networks. Thus, all real-time video systems that benefit from immediate feedback should strongly consider to employ FBIR or similar feedback-based ER schemes.

Utgiver

Institutt for elektronikk og telekommunikasjon