IP-telephony: the Main Digital Signal, Codecs, Bandwidth

Some conclusions about IP-telephony: the main digital signal, codecs, bandwidth

In the course of studying the materials for the CCNA Voice exam, the idea was has appeared to process the information obtained in the form of a separate article. Pursuing this two purposes: one mercenary – better to understand most of the studied material and put everything together in your mind; The second altruistic – to share the knowledge gained with those who are not very interested in it.

In the article, I will talk about the processes of voice coding, codecs as such and calculations of the bandwidth required for voice transmission in IP networks

About the main digital signal

I think it’s not worth explaining that to transmit an analog signal (which is the voice of a person) over IP-networks, it is necessary to convert this signal into a sequence of ones and zeros. Briefly, the essence of this process is that, based on the Kotel’nikov theorem (or the Nyquist’s theorem), using pulse-code modulation to transmit a voice signal without loss of quality, it is sufficient to transmit data at a rate of 64 kbit / s.

64 kilobits per second is what is called the basic digital signal in modern digital telephony.

At 32 (30 voice + 2 service) basic digital signals, the primary (the smallest, simplest) level in the pliosynchronous digital hierarchy (PDH) – the so-called E1 Stream (2048 kbit / s). The main digital signal itself is sometimes called the zero level. It is worth noting that there is a second (E2), third (E3) and fourth (E4) levels in the PDH. Each subsequent level is multiplexed from the four previous ones with some additional information added, for example, E3 = 4 * E2 + alarm.

On PDH-technology for a while (in the 80-th) the digital telephony was built all in the world. However, it had some shortcomings, the most significant of which was the need to sequentially demultiplex the high-level stream to extract lower-level flows. That is, for example, to extract one E1 stream from the E4 stream to route it to another location, it was necessary to first decompose E4 into four E3, then disassemble E3 into four E2, parse E2 into four E1, redirect E1 where it should, Collect the stream in reverse order and send it on. Boringly in general, and requires a lot of resources.

PDH technology has been replaced by SDH (synchronous digital hierarchy), which is still the main option for mobile operators, and the networks of our two backbone providers (TTK, RTK) are still based on SDH.

Nevertheless, the primary levels (E1) have not disappeared, and sometimes remain the only way to organize communication. For example, all telephone operators in our country use N-number of E1 streams for joints with each other.

Let’s return to IP-telephony, I mean to switches packets, and let’s forget about switching channels for a while.

About codecs

So, you and I have a primary digital channel, the embodiment of which in IP-networks became G.711 codec. This standard has become the de facto the most popular and is now used in protocols such as SIP and SCCP. It uses a bandwidth of 64 kbps and is probably familiar to everyone who deals with modern IP telephony.

The standard was developed in the seventies of the last century and at the moment the term of the patent for it has expired, and it is a national treasure.

The standard describes two coding algorithms – Mu-law (used in North America and Japan) and A-law (used in Europe and the rest of the world). Both algorithms are logarithmic, but the later a law was originally designed for computer processing of processes. (C) Wikipedia

In addition to the universally recognized G.711, there are much more standards for encoding/decoding audio signals. The most popular of these are G.729, G.729a, G.726, G.728. If we estimate them by the occupied bandwidth, we see the following picture:

G.729 – 8 kbps
G.729a – 8kbps
G.726 – 32 kbps
G.728 – 16 kbps

It would seem that if they use a smaller bandwidth, then why not become more popular G.711? The fact is that the bandwidth is not the most important parameter of the codec, the speed of work is also important, and as a result – the loading of DSP (Digital Signal Processor) – a digital signal processor that in real time is responsible for the encoding / decoding of the signal.
Another important criterion determining the success of a particular codec is the so-called. MOS (Mean Opinion Score, in the Russian literature occurs as averaged subjective assessment). The idea of MOS is very simple: a specially formed group of people are given the opportunity to use the communication system and are asked to rate from 1 (awfully) to 5 (excellent). The averaged data of such a study are called MOS.

So, for the codecs I specified, MOS estimates have the following values:

G.711 – 4.1 (according to some sources 4.45 for the Mu-law)
G.729 – 3, 92 (it would be possible to compete with G.711, but here’s a lot of CPU time)
G.729a – 3.7 (this codec works much faster than its older brother, but as we see – at the expense of quality)
G.726 – 3.85
G.728 – 3.61

In addition, the combination of all these factors (throughput, speed, MOS) determines the primacy of a codec in the realm of digital coding of signals.

By the way, all these standards (well, which begin with G.) are the fruits of the activity of the international advisory committee for telephony and telegraphy (the ITU division), and in fact are proprietary. Moreover, in our time it is difficult to imagine the lack of free alternatives in proprietary standards. Therefore, in the field of audio coding, the standard iLBC (internet low bit rate codec) was born, which uses 15.2 kbps and has a MOS score of 4.1. It is these factors, along with the openness that influenced the fact that this standard is used by Google talk, Yahoo Messenger, and our favorite Skype.

It is worth noting that popular IP-PBXs (asterisk, Cisco CME) support all these codecs, and you always have the right to determine what you will use in your telephone network.

About the bandwidth

Estimated throughput is the parameter that must be taken into account when planning any data transmission network so that it is easily scalable and your users do not have unnecessary inconveniences in the process of its operation. I will repeat – any network, including VoIP networks.

An important parameter in this particular case is the size of the sample (measured in milliseconds). The size of the sample is the parameter that determines the “amount” of voice information in the IP packet-for example, in the same standard package sizes you can cram one syllable or two. The larger the sample size, the more economical you spend your bandwidth, but the more you hear a delay in the conversation (the effect of the digital processor on encoding/decoding).

I do not know about Asterisk (I hope someone will tell you), but in Cisco CME (Cisco’s IP telephony solution), unfortunately, there is no such parameter – the size of the sample, but there is a parameter that determines the number of bytes in the sample. They are related to each other by a simple formula (linear dependence) and are easily expressed through each other. And here is the formula:

BvC = PC * PPC / 8, where BTS is the number of bytes in the sample, PC is the sample size in seconds, the AUC is the codec’s transmissions in bits/second. That is, if we want that when using the G.711 codec in one package there was, for example, 20 milliseconds of conversation, then we need to set the value of the parameter BvC = 0,02 * 64000/8 = 160

Thus, we need to lay 160 bytes of useful information in our UDP-fragment. Ok, we go further.

Suppose we use a classic IP network, the channel protocol for which is Ethernet, plus we want to drive it all in an encrypted VPN network. Then another 18 bytes of Ethernet overhead information will be added to our 160 bytes. Add here the network and transport layer – the headers IP, UDP and RTP (20 + 8 + 12 bytes). And wrap all our good in IPSec – another plus of 50 bytes. At the output, we have a packet of 268 bytes.

To calculate the total bandwidth, we need to multiply the size of this packet by the number of packets per second. Taking into account that the size of our sample is 20 ms, in one second there will be 50 such samples. Multiplying 50 by 268 we see that in one second we need to drive 13400 bytes or 107200 bits per second, that is 107, 2 Kbit per second. And this is almost twice as much as the original 64 kilobits! It is this number that you need to proceed when planning your network.

Be careful! May the force be with you!