MPEG video

Introduction of multimedia codecs

3.3 MPEG video

About MPEG

MPEG (Moving Picture Experts Group) is the ISO committee, which is responsible for defining the various MPEG video specifications. An MPEG video codec is used in many current multimedia products and is at the heart of many digital televisions set-top boxes, e.g. DSS, HDTV decoders, DVD players, video conferencing, Internet video, and other applications. It has a high compression ratio (up to 150:1) and needs less bandwidth for the transmission of the video stream over network and requires less space to store the encoded videos. Because of this, it has become an international standard. MPEG1 is the specification used for Video CDs and MPEG2 was intended for digital television applications and is used for DVD-Video. MPEG-4 is the latest standard and is intended for video conferencing.^[63]^[64]

MPEG is derived from H.261 video standard and from the ITU and JPEG image formats.^[65]It consists of two layers, a system layer containing timing information to synchronise video and audio and a compression layer including audio and video streams.

MPEG-1 was originally optimised to work at video resolutions of 352 * 240 pixels at 30 fps (NTSC) or 352 * 288 pixels at 25 fps (PAL). The bit-rate is optimised for applications of around 1.5 Mb/sec.

MPEG codec

YCbCr

Research into the Human Visual System (HVS) has shown that the human eye is most sensitive to changes in luminance and less sensitive to variations in chrominance. To effectively take advantage of this, MPEG uses the Y (luminance), Cb (blue colour difference) and Cr (red colour difference) colour space to represent the data values instead of the RGB encoding normally used in computer graphics.

When encoded, a picture frame can be divided into many macro-blocks that are 8 * 8 pixels in size. When referring to the YCbCr colour space, a macro-block can be represented in the formats of 4:4:4, 4:2:2 and 4:2:0 video. Here 4:4:4 is full bandwidth YCbCr video and each macroblock consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks. 4:2:2 contains half as much chrominance information as 4:4:4, and 4:2:0 contains one quarter of the chrominance information. The 4:2:0 mode is the most widely used format in consumer level products, and it reduces the data required from 12 blocks/macroblock to 6 blocks/macroblock, that is a 2:1 compression compared to the 4:4:4 or RGB format. (Figure 2-11)^[63]

Figure 2-11 YcbCr format and pixel co-siting in 4:2:0^[63]

Frame format

There are three types of frames in the MPEG format, I-frames, P-frames and B-frames. I-frames and P-frames are the same as in H.261. Whereas B-frames are coded as differences from the last or next I or P frames which are commonly referred to as bi-directional interpolated prediction frames (Figure 2-12).

Figure 2-12 B-frames encode^[65]

Video Stream Composition

For random access and the location of scene cuts in the video sequence, MPEG allows the encoder to choose the frequency and the location of I-frames. Normally I-frames are used two times a second. The number of B-frames between I or P frames can also be selected. The typical pattern of this sequence is 1 I-frame for every 12 to 15 frames. 2 B-frames between a pair of I or P-frams. A typical display order of frames is shown in figure 2.6. (Figure 2-13)

Figure 2-13 Typical ordering of frames^[62]

For efficiency and to reduce latency, frames are re-ordered in the video stream to the decoder. The reference frames needed to reconstruct B-frames are sent before the associated B-frames. (Figure 2-14)

Figure 2-14 Video stream order^[^62]

Layered structure

The structure of a MPEG video is described as sequence which is composed of a series of Groups of Pictures (GOP's). Furthermore, a GOP is composed of a sequence of pictures (frames). From higher level to lower micro level, the sequence of their subset is: ^[71]

Sequence à GOP à Frame à Slice à Macro block à Y,Cb,Cr block or motion vector(Figure 2-15)

Figure 2-15 MPEG hierarchical structure

GOP (Group of Pictures): A GOP is an independent unit that can be decoded and can be of any size as long as it begins with an I-frame. It consists of all the frames that follow a GOP header before another GOP header.

The first picture after the GOP header is an I frame that doesn't need any reference to any other picture. In its header there is a time code for the first picture of the GOP to be displayed. So this layer can be accessed randomly.

Open GOP: Sometime a B frame, which follows an I frame after the header, has a reference that comes from the previous GOP. In this case the GOP is called an Open GOP. If a random access to such a GOP is performed, this B frame shouldn't be displayed.

Closed GOP: when either there are no B frames immediately following the first I frame or such B frame haven't any references coming from the previous GOP, then this kind of GOP is called a Closed GOP.^[72]

Slice: A slice is a part of a MPEG image. There can be 1 slice per frame, 1 slice per macro-block, or anything in between. Each slice is coded independently from the other slices of the frame. This layer allows error confinement. If errors in the bit stream are detected, the decoder can try to continue the decoding process looking for the next slice header.^[72]

DCT (Discrete Cosine Transform)

Normally the adjacent pixels within an image tend to be highly correlated. MPEG encoding uses Discrete Cosine Transform (DCT) to reduce the data required to represent a single frame. Each of the 6 blocks (4 for Y one each for U an V) in the macro block is then decomposed into underlying spatial frequencies, which then allow further reducing of the precision of the DCT coefficients consistent with the Huffman coding.

According to the Fourier transform, a signal is decomposed into the weighted sums of a series of orthogonal sines and cosines. When added together the original signal is reproduced. An 8x8 pixel block is converted to an 8x8 block of coefficients indicating a "weighting" value for each of the 64 orthogonal basis patterns (Figure 2-16) added together to produce the original image. Figure 2-17 shows how the vertical and horizontal frequencies are mapped into the 8x8 block pattern.^[63]

By missing the higher frequency components, the encoded data amount can be reduced again to reduce the bandwidth. Of course its quality will be degraded as well.

Figure 2-16 DCT basis patterns^[63]

Figure 2-17 Frequency map^[63]

DCT Coefficient Quantisation: Because the HVS is less sensitive to errors in high frequency coefficients than that in lower frequencies, the higher frequencies can be more coarsely quantised when encoding. In Figure 37 , the lower frequency DCT coefficients toward the upper left- corner of the coefficient matrix correspond to a solid luminance or colour value for the entire block. On the other hand, the higher frequency DCT coefficients toward the lower right corner of the coefficient matrix correspond to finer spatial patterns, or even noise within the image. Each coefficient is divided by a corresponding quantisation matrix value that is supplied from an intra quantisation matrix (Figure 2-18). It may substitute a new quantisation matrix at a picture level if the encoder decides it is warranted. This operation forces as many of the DCT coefficients to zero, or near zero, as possible to reduce the bandwidth required.^[63]

Figure 2-18 default intra quantisation matrix^[63]

Intra Frame Decoding

The decoding is performed in the reverse order of the encoding process. In this manner, an I frame decoder consists of an input bits-stream buffer, a Variable Length Decoder (VLD), an inverse quantiser, an Inverse Discrete Cosine Transform (IDCT), and an output interface to the required environment (computer hard drive, video frame buffer, etc.). ( Figure 2-19)

Figure 2-19 Intra Frame Decoder^[63]

The VLD operates on a bit-wise basis to search every bit in the stream. Because of the extensive high-speed, bit-wise processing, it is more complex to implement than that in the encoder.

The inverse quantiser multiplies the decoded coefficients by the corresponding values of the quantisation matrix and the quantisation scale factor. The resulting coefficients are clipped and an IDCT mismatch control is applied to prevent long term error propagation.

At last the IDCT operation is completed.^[63]

Non-Intra Decoding

Non-intra decoding is similar to the intra-frame decoding except for the addition of motion compensation support. (Figure 2-20).

Figure 2-20 Non-Intra Frame Decoder^[63]

The compression performance of MPEG 1 is shown in Table 2-3.

Table 2-3 Compression performance of MPEG 1

Type	Compression
I	7:1
P	20:1
B	50:1
Average	27:1

MPEG2

MPEG-2 was designed with respect to digital television broadcasting, issues considered were the efficient coding of field-interlaced video and scalability. MPEG-2 is supports 720 * 480 resolution video at 30 fps (NTSC) and 720 * 576 at 25 fps (PAL), at bit-rates up to 15 Mbps. Another format is the HDTV resolution of 1920 * 1080 pixels at 30 fps at a bit-rate of up to 80 Mbps. Table 2-4 shows the parameters of normal MPEG2 applications on different levels:

Table 2-4 MPEG-2 target applications

Level	size	Pixels/sec	bit-rate(Mbits)	Application
Low	352 * 240	3 M	4	Consumer tape equiv
Main	720 * 480	10 M	15	Studio TV
High	1440 * 1152	47 M	60	Consumer HDTV
High	1920 * 1080	63 M	80	Film production

Differences between MPEG2 and MPEG-1

MPEG2 searches on fields, not just frames.

MPEG2 encodes 4:2:2 and 4:4:4 macro-blocks

Frame sizes of MPEG2 is as large as 16383 x 16383

MPEG-2 allows higher bit rates then MPEG-1.

MPEG2 has non-linear macro-block quantisation factor

MPEG-1 only allows for progressive picture sequences. MPEG-2 also allows interlaced sequences.

MPEG-2 allows difference scan patterns then the zigzag pattern mentioned.

MPEG-2 allows surround sound, and alternate language channels.

MPEG-2 has extra spatial scalability information (so different decoders can get different quality outputs)

MPEG-2 also allows temporal scalability so that one stream can be displayed at different frame rates.

A bunch of minor improvements.

Last update April 9, 2002