Key VoIP Tests

Delay

Excessive end-to-end delay makes conversation inconveient and unnatural. Each component in the transmission path - sender, network, and receiver - adds delay. ITU-TG.114 (One-Way Transmission Time) recommends 150 mSec as the maximum desired one-way latency to achieve high-quality voice.

Sample Delay Budget Table

Parameter	Fixed delay	Variable delay
CODEC (G.729)	25 mSec
Packetization	Included in CODEC
Queuing delay		Depends on uplink. In the order of a few mSec.
Network delay	50 mSec	Depends on network load.
Jitter buffer	50 MSec
Total	125 mSec

End-to-end Delay

end-to-end delay

Jitter

Quantifies the effects of network delays on packet arrivals at the receiver. Packets transmitted at equal intervals from the left gateway arrive at the right gatway at irregular intervals. Excessive jitter makes speech choppy and difficult to understand. Jitter is calculated based on the inter-arrival time of successive packets. For high-quality voice, the average inter-arrival time at the receiver should be nearly equal to the inter-packet gaps at the transmitter and the standard deviation should be low. Jitter buffers (packet buffers that hold incoming packets for a specified amount of time) are used to counteract the effects of network fluctuations and create a smooth packet flow at the receiving end.

jitter

Packet loss

Typically occurs either in bursts or periodically due to a consistantly congested network. Periodic loss in excess of 5-10% of all voice packets transmitted can degrade voice quality significantly. Occassional bursts of packet loss can also make conversation difficult.

packet loss

Sequence Errors

Congestion in packet switched networks can cause packets to take different routes to reach the same destination. Packets may arrive out of order resulting in garbled speech.

sequence errors

Recommendations for Measuring Voice Quality

ITU-T Recommendations P.800 - Subjective quality test based on Mean Opinion Scores (MOS). Preselected voice samples recorded according to recommendation P.50 are played back to a mixed group of men and women under controlled conditions. The scores given by the group are weighed to give a single MOS score ranging from 1 (worst) to 5 (best). A MOS of 4 is considered "toll-quality" voice.

Mean Opinion Scores (MOS) for Various Voice Quality Tests

Score	Opinion Scale: Conversation Test	Difficulty Scale	Opinion Scale: Listening Test	Listening: Effort Scale	Loudness: Preference Scales
5	Excellent	----	Excellent	Complete relaxation possible, no effort required	Much louder than preferred
4	Good	----	Good	Attention necessary; no appreciable effort required	Louder than preferred
3	Fair	----	Fair	Moderate effort required	Preferred
2	Poor	----	Poor	Considerable effort required	Quieter than preferred
1	Bad	yes	Bad	No meaning understood with any reasible effort	Much quieter than preferred
0	----	no	----	----	----

Objective Voice Quality Measurements

ITU-T Recommendation P.861 - Objective quality Measurement of Telephone Band (300-3400 Hz) Speech Codecs
PAMs - Perceptual Analysis Measurement System (proposal from British Telecom)
Intrusive methods based on comparison of a predefined speech sample before and after transmission through a codec or network.
The resulting score approximates MOS scores as would be given by humans under recommendation P.800.