Yong Qiu Liu's Web Page--G.728 Audio

Introduction of multimedia codecs

4.4 G.728 LD-CELP audio

G.728 is an ITU (The International Telecommunications Union) standard that supplies 16 Kbps algorithm for telephone-bandwidth speech codec using LD-CELP (Low-Delay Code-Excited Linear Prediction) method. It was designed to provide speech quality which is equivalent to or better than that of the G.726 32 Kbps ADPCM standard. Based on a standard analysis-by-synthesis CELP coding technique, the algorithm is designed to meet the needs of low-delay high-quality speech coding. By using short excitation vectors (5 samples, or 0.625 ms) and backward-adaptive linear predictors, the algorithmic delay of the resulting coder is 0.625 ms. Then the overall delay is less than 2 ms. It can be used in the environment of multiple speakers and background noise, and, also, to handle non-speech signals. ^[49]

LD-CELP Codec:

LD-CELP is inherited from CELP by implementing its analysis-by-synthesis approach to code-book search and has improved it by adding backward adaptation of predictors and gain to achieve an algorithmic delay of 0.625 ms. LD-CELP only transmits the index to the excitation codebook and updates the predictor coefficients through LPC analysis of previously quantised speech. It updates the excitation gain by using the gain information which is embedded in the previously quantised excitation. There are only five samples in the block size for the excitation vector and gain adaptation. LPC is used to update the perceptual weighting filter in the analysis of the unquantised speech.^[50]

The encoder part:

Uniform PCM Converter: Converting the A-law or µ -law PCM to uniform PCM.

Vector Buffer: the uniform PCM signal is partitioned into blocks of five-consecutive input signal samples to form a 5-dimensional speech vector.

Adapter for perceptual weighting filter: calculates the coefficients of the perceptual weighting filter once every four speech vectors based on linear prediction analysis (often referred to as LPC analysis) of unquantised speech.

Perceptual weighting filter: Converts the current input speech vector to the weighted speech vector.

Synthesis filter: The synthesis filter is updated by the backward synthesis filter adapter. The synthesis filter is a 50-th order all-pole filter that consists of a feedback loop with a 50-th order LPC predictor in the feedback branch.

Backward synthesis filter adapter: This adapter updates the coefficients of the synthesis filter and takes the quantised (synthesised) speech as input and produces a set of synthesis filter coefficients as output.

Backward vector gain adapter: For every vector time index, the adapter updates the excitation gain which is a scaling factor used to scale the selected excitation. This adapter takes the gain-scaled excitation vector as its input, and produces an excitation gain as its output.

Excitation VQ codebook: Supplies a 1024 candidate code book of vectors for each input block.

Min. MSE (Mean-Squared Error) measure: From the resulting 1024 candidate quantised signal vectors, the encoder identifies the one that minimises a frequency-weighted Mean-Squared Error measure with respect to the input signal vector. The 10-bit code-book index of the corresponding best code-book vector (or "codevector"), which gives rise to that best candidate quantised signal vector, is transmitted to the decoder.

Then the best codevector from minimum MSE measure is then passed through the gain scaling unit and the synthesis filter to establish the correct filter memory in preparation for the encoding of the next signal vector. The synthesis filter coefficients and the gain are updated periodically in a backward adaptive manner based on the previously quantised signal and gain-scaled excitation.

Figure 2-25 LD-CELP encoder

The decoder part:

Excitation VQ codebook: This block contains an excitation VQ codebook (including shape and gain codebooks). It uses the received best codebook index to extract the best code vector selected in the LD-CELP encoder.

Gain scaling unit: Computing the scaled excitation vector by multiplying each component by the gain.

Synthesis filter: Filtering the scaled excitation vector to produce the decoded speech vector.

Decoder: The decoding operation is also performed on a block-by-block basis. Upon receiving each 10-bit index, the decoder performs a table look-up to extract the corresponding codevector from the excitation codebook. The extracted codevector is then passed through a gain scaling unit and a synthesis filter to produce the current decoded signal vector. The synthesis filter coefficients and the gain are then updated in the same way as in the encoder. The decoded signal vector is then passed through an adaptive post-filter to enhance the perceptual quality. The post-filter coefficients are updated periodically using the information available at the decoder. The five samples of the post-filter signal vector are next converted to five A-law or µ -law PCM output samples.

Figure 2-26 LD-CELP decoder

Last update April 9, 2002