|
Introduction to H.264/AVC
For info of implementing H.264/AVC
encoder on TMS320DM642, please click here.
H.264/AVC is the latest international standard for
video coding, issued in May 2003. It was jointly developed by the ITU-T
Video Coding Experts Group (VCEG) together with the ISO/IEC Moving
Picture Experts Group (MPEG). The official name is Advanced Video Coding
(AVC), a.k.a. H.264 or MPEG 4 Part 10. The standard defines the video
bitstream and decoding method, allowing design flexibility for encoding
process. Figure 1 briefly summarizes the history of H.26x and MPEG-x
series video coding standards.

Figure 1. The history of video coding standards.
Compared to the other standards, H.264/AVC contains a
number of new features, which not only offers lower bit rate and more
efficient compression, but also provide more flexibility for application
to a wide variety of network environments. As shown in Figure 2, H.264
consists of two layers, namely Video Coding Layer (VCL), and Network
Abstraction Layer (NAL).

Figure 2. Two layers in H.264/AVC.
The goal of VCL is to encode the video independently from the network
layer. The syntax supports a hierarchy of video data partition, varying
from slice, macroblock, sub-block, and pixel, as shown in Figure 3. The
NAL formats the VCL representation of the video and provides header
information in a manner appropriate for conveyance by particular
transport layers (such as Real Time Transport Protocol) or storage
media. All data are contained in NAL units, each of which contains an
integer number of bytes. A NAL unit specifies a generic format for use
in both packet-oriented and bitstream systems.

Figure 3. Data
hierarchy in Video Coding Layer.
Like the other video coding standards, H.264/AVC incorporates
different profiles and levels. Profiles define sets of bit stream
features a H.264 stream can use. Levels define restrictions on the video
resolution, frame rate and some stuff called VBV (Video Buffer
Verifier). There are up to 16 profiles and 16 levels in the current
version. Three most commonly used profiles are baseline profile (BP),
main profile (MP), and extended profile (EP), as shown in Figure 4.

Figure 4.
Baseline, main and extended profiles in H.264.
The basic macroblock encoding structure is given in Figure 5. The main
idea is to predict the frame in advance and encode the errors
between the original frame and the predicted one. To obtain the
predicted frame, motion estimation and motion compensation are adopted.
For each block in current frame, best matching block is searched by
computing the sum of absolute difference (SAD) within a predetermined
window in previous frame. After finding the closest matching area
(minimal SAD value), H.264 calculates offset between the current block
and the reference block, also known as motion vector (MV). Following
these MVs, H.264 re-builds a predicted frame by copying the reference
blocks to the new positions. Then H.264 calculates the residual error
between the predicted frame and the current frame, which will be entropy encoded into bit streams. The H.264 encoder has an implicit
decoder inside, in order to be accordance with the decoder side on the
reference frames. Because some part of the quantization and transform
are lossy, the decoder reconstructs a frame which might be different
from the one that encoder predicts. Therefore, after quantization in the
encoder, there exists an inverse transform and a dequantization which
guarantee the encoder and decoder use the same predicted frames.

Figure 5. H.264 encoding structure.
Figure 6 and
Figure 7 show the comparisons between H.264, H.263, MPEG-2, and MPEG-4.
The two sample videos tested are foreman (QCIF) and tempete (CIF). From
the curves, we can easily find that H.264 has a higher quantity than the
other three standards with a even lower bit rate.
Figure 6. Comparison to MPEG-2, H.263, MPEG-4 (QCIF)

Figure 7. Comparison to MPEG-2, H.263, MPEG-4 (CIF)
Reference
1. Standard: H.264/AVC video coding standard,
by ITU-T and ISO/IEC.
2. Paper:
Overview of the H.264/AVC video coding standard, by
Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra.
3. All the figures above are taken from my previous slide, click
here for download. This file contains Chinese
text. Without proper rendering support, you may see question marks,
boxes, or other symbols instead of Chinese characters. Be sure to
install package for Chinese language support.
Last updated by Yang Song
August 26, 2009
|
|