Yong Qiu Liu's Web Page--H.263 video

Introduction of multimedia codecs

3.2 H.263 video

The picture formats and their resolution supported in H263 video are SQCIF at 128 * 96 pixels and QCIF at 176 * 144 pixels. As options, it also has CIF, QCIF, 4CIF and 8CIF that are 2, 4, 8 and 16 times the resolution of QCIF respectively. The compression ratio of H.263 ranges from 1:1 up to 133:1 and depends on the quality required.^[87]

H.263 is an ITU standard, which support video compression (coding) and decompression (decoding) for video-conferencing and video-telephony applications.

H.263 is designed for video coding at a bit rates around 20-30kbps or above.

It specifies the requirements and the data contents and format for a video encoder and decoder ^[16].

Originally H.263 was a recommendation of ITU for Very Low Bit rate encoding such as video telephony on normal analogue telephone lines (PSTN video telephony) with the bit rates below 28.8 Kbps. Later on, it has been used for a wide range of bit rates, not only just low bitrate applications but also as a replacement of H.261.

There are several data compression and decompression strategies used in H.263. The following gives a quick introduction to this method.

Video encoding

Strategies implemented in the encoding process are Motion estimation and compensation, Discrete Cosine Transform (DCT), Quantisation , Entropy encoding and Frame store.

Figure 2-9 H.263 video encoder block diagram ^[16]

Motion estimation and compensation: To reduce the bandwidth, the current frame is compared with the previous frame. The similar part is ignored. Only the differences are encoded. The current frame is divided into 16 * 16 pixel blocks. Each block is compared with its surrounding in the previous frame in order to determine where it came from. If a match is found, the motion is recorded and this block is reduced from the current frame. After this motion estimation and compensation, the current frame retains just some "residual" information.

Discrete Cosine Transform (DCT): These "residual" blocks then are compressed using a two-dimensional Discrete Cosine Transform (DCT) that converts the 2-dimensional data in the block into a series of coefficients. H.263 uses an 8 * 8 DCT to discretise the 8 * 8 blocks of original pixels or motion-compensated difference pixels and to compact their energy into as few coefficients as possible ^[16]^[17]. The detail is described in the section of MPEG video in 2.1.1 Video codecs.

Quantisation: This step is required to discard the data from high frequency changes, which is not perceived by the human’s eye^[17]. The converted coefficients are then divided by a scale factor. Just the coefficients with significant values are kept. Other coefficients have their values set to "0". This process will cause information loss.

Entropy encoding: Known as Huffman encoding, ensures that after quantisation, the frequently-occurring values of the coefficients are replaces with short binary codes and infrequently-occurring values are replaces with longer binary codes. Then a sequence of variable-length binary codes is generated.

Frame store: These binary codes, now, are re-scaled, inverse transformed using an Inverse Discrete Cosine Transform and are combined with synchronisation and control information (motion "vectors" for instance) to form the encoded H.263 bitstream. The contents of this frame are stored and will be used to determine the best matching area for motion compensation by the motion estimator when the next frame is encoded.

Video decoding

Strategies implemented in the decoding process are entropy decoding, re-scaling, inverse discrete cosine transform and motion compensation.

Figure 2-10 H.263 video decoder block diagram ^[16]

Entropy decoding: This is the process that extracts the coefficient values from the variable-length codes in the H.263 bitstream and, also, extracts motion vector information as well.

Rescaling: The "reverse" of quantisation. The coefficients are scaled back to their original values. As the insignificant coefficients were ignored during quantisation, some information from the block being encoded can no longer be rebuilt.

Inverse Discrete Cosine Transform (IDCT): Each micro block of samples is recreated by reversing the DCT operation. These blocks are the differences between the current frame and the previous frame.

Motion compensation: Based on the previous frame, using the differences of values in each micro block and their motion vector information, the values of the current frame can be reconstructed. Of course this frame is not exactly the same as the original one because there is data lost during encoding. The reconstructed frame is placed in a frame store and it is used to motion-compensate the next received frame.

Real-time video communications: This is a practical issue which varies with different environments. To realise a real-time communication, some extra controls should be considered. Those are bit rate control, synchronisation and audio multiplexing.

Firstly, the bandwidth of the network used for the communication normally is fixed. As the contents of video stream vary over time, the size of the encoded data in each second is not fixed. To ensure the maximum size of the encoded data per second does not exceed the existing bandwidth and also, to keep the encoded data to be a constant bit rate for convenience, a bit rate control is added into the quantisation section of the encoding process. If the output bit rate of the encoder is too high, the compression ratio will be increased by increasing the quantiser scale factor. Of course this will give a worse quality image at the decoder. On the other end, if the bit rate is too low, the compression ratio will be decreased by decreasing the quantiser scale factor. Then, the quality of the image will be increased, as well as its bit rate.

Second, to guarantee the synchronisation of the video stream, there are many headers or markers in the encoded stream which record the position of the current data in the frame and the time code of the frame. The decoder will check each header to make sure that the video stream is in the proper order and do some adjustments if necessary.

Audio multiplexing: H.263 is only a video coding description. For the audio codec, please see the next section for H.723 audio. For the synchronisation, multiplexing and protocol issues with audio stream, the "umbrella" standards such as H.320 (ISDN-based videoconferencing), H.324 (POTS-based video telephony) and H.323 (LAN or IP-based videoconferencing) are referred.

There are five picture formats supported by H.263. They are SQCIF (sub-QCIF) at 128 * 96 pixels, QCIF at 176 * 144 pixels, CIF, 4CIF and 16CIF. Where SQCIF are approximately half the resolution of QCIF, 4CIF and 16CIF are 4 and 16 time of the resolution of CIF respectively. This causes H.263 to compete with other higher bit-rate video coding standards such as the MPEG standard. The table below shows those picture formats supported by H.263 ^[18].

Table 2-2 Picture Formats Supported ^[18]

Picture format	Luminance pixels	Luminance lines	H.261 support	H.263 support	Uncompressed bitrate (Mbit/s)
					10 frames/s		30 frames/s
					Grey	Colour	Grey	Colour
SQCIF	128	96		Yes	1.0	1.5	3.0	4.4
QCIF	176	144	Yes	Yes	2.0	3.0	6.1	9.1
CIF	352	288	Optional	Optional	8.1	12.2	24.3	36.5
4CIF	704	576		Optional	32.4	48.7	97.3	146.0
8CIF	1408	1152		Optional	129.8	194.6	389.3	583.9

With the current level of development of computer hardware and network technology, all aspects in the H.263 described above can be realised to achieve a "reasonable" video quality by software in real time communication, using the Pentium family computers.

Last update April 8, 2002