Yong Qiu Liu's Web Page--Other standards

Introduction of multimedia codecs

5. Other standards

5.1 m -LAW and A-LAW:

Both A-law and µ-law are companding schemes that are used in telephone network to create the 8 bit samples with linear coding. They compand 12 to 14 bit samples at 8 kHz to 8 bit for transmission over 64 kbit/s data channel. The data will be converted back to linear scale in the receiving end and will be played back.

µ -law:

y = sgn(m) / ln(1+m ) * ln[1+m * |m / mp| ] ------- |m / mp| =< 1

a-law:

y = A / (1+ lnA) * ( m / mp ) ------------ | m / mp | =< 1 / A

y = sgn(m) / ( 1+ lnA) * [1 + ln( A* | m / mp | ) ] ------ 1/A =< | m / mp | =< 1

Here

sgn(x) = sign (+/-) of x

u = 255,

A = 87.6,

mp is the peak message value,

m is the current quantized message value.

Normally the µ -LAW is used in North America and Japan, and the A-LAW is used in Europe and the rest of the world and international routes. ^[51]

5.2 LPC (Linear Predictive Coding):

Linear Predictive Coding (LPC) is a speech analysis technique which is one of the most useful methods for encoding good quality speech at a low bit rate. It supplies an extremely accurate and relatively efficient calculation to estimate speech parameters.^[52]

There are several procedures in this technique.

Prediction: This is a process of determining the relationships that enable the value of the next output sample to be calculated as a combination of previous output samples in a sampled data system. That is, to be predicted according to the trends of the past values. That is why it is called Linear.

Compression: Using Linear Prediction, a sound can be reduced to a spectral envelope, a small bit of other data, and an excitation source.

Pitch/Speed Shifting: To alter the pitch or speed of a sound independently from one to another. LPC corrects problems like munchkinization by maintaining the correct amplitudes over their associated frequencies, which holds the formants in their correct positions. Then sounds can be sped up, slowed down or have their pitches raised or lowered with the sound remaining completely natural.

Cross-Synthesis: To recreate an actual sound from a LPC analysed signal, some other information like pitch and excitation must be measured. If a voice is re-synthesised using its LPC analysis without the pitch information, the voice will sound very robotic. By replacing the excitation sound with another sound you can create a completely new sound. One classic example is cross-synthesising an orchestra with a person's voice. The result is a talking orchestra!^[53]

The detailed procedures are listed in the following diagram (Figure A-1).

Figure A-1 LPC Diagram^[53]

Pulse Wave: In the LPC diagram, this represents a pulse-type waveform which is used as an exciter for recreating the pitched portions of the original sound.

Noise: In the LPC diagram, this represents a source of random noise which is used as an exciter for recreating the non-pitched portions of the original sound.

Balance: This unit of the LPC system balances the amount of each type of exciter, to create the perfect amount of non-pitched, and pitched sound. The values for balancing are measured and stored as "Threshold" with the LPC coefficients and pitch information.

Pitch: This path represents the original pitch information needed to recreate the original signal from the LPC analysis.

Threshold: This is the information for the Balance unit which determines the balance of the pitched and non-pitched exciters.

Amplitude: This information is used to scale the exciter signal, and in turn the overall output, to match the original signal's amplitude shifts.

All-Pole Filter: The All-Pole Filter is a filter which will shape the spectrum of the incoming signal according to the LPC analysis of the original sound's spectrum.

Coefficients (B's): These are the results of the original LPC analysis. These numbers must be used by the All-Pass Filter to shape the incoming exciters so their spectrum will match accordingly.^[53]

5.3 CELP(Code Excited Linear Prediction):

CELP is an algorithm described in federal standard 1016 which provides good quality, natural sounding speech at 4800-9600 bps. It compensates for the lack of quality of the simple LPC model by using more information in the excitation. A codebook is used which is a table of typical residue signals set up by the system designers. The residue is compared to all the entries in the codebook, the entry with the closest match is chosen. Only the code for that entry is sent. On the receiving end, the code is analysed and the corresponding residue from the codebook is retrieved. Then the formant filter is excited.

This codebook must be big enough to include all the various kinds of residues for CELP to work well. But this will be time consuming to search through, and will require large codes to specify the desired residue. This problem is solved by using two small codebooks instead of one very large one. One codebook is fixed by the designers that contains just enough codes to represent one pitch period of residue. The other codebook is adaptive, starts out empty and is filled in during operation. In this way, they act like a variable shift register, and the pitch is provided by the amount of delay.^[52]

5.4 ACELP(Algebraic Code Excited Linear Prediction):

ACELP is an efficient technology for speech compression at bit rates around 8Kbps. ACELP achieves its efficiency through the use of algebraic codebook that requires no storage with fast searching procedure. ACELP technology is widely used. Some examples are GSM EFR, G.729 and 723.1.

Last update April 9, 2002