Transmission of High Quality Audio over IP Networks
MetadataShow full item record
This thesis deals with the transmission of high quality audio over packet switched networks, such as the internet. Unlike other transmission media, no guarantees are given regarding bandwidth, delay, or loss ratios, and all these factors will typically be time-varying. For interactive, or two-way, applications it is important to keep a low end-toend delay in order to have a high Quality of Experience (QoE). In this scenario, retransmission of lost packets is not possible simply due to the delay restrictions, thus one has to expect and prepare for periods with packet loss. The varying bandwidth is also a challenge. A desired feature for media streams is scalability, or the ability to easily adjust the sending rate to the current conditions. Furthermore, the receivers may have a wide range of equipment and capabilities. Ideally the same media stream, or possibly a subset of it, should be used for all receivers. With a scalable format the use of transcoding is avoided, which will increase the complexity and possibly the delay as well. Multiple aspects of audio transmission are handled in this thesis. Techniques for Error Protection (EP) and robust transmission, lossless and perceptual compression, and the quality experienced by the end users are investigated. A novel EP scheme where a perceptual criterion is used to select which parts of the audio that needs protection is presented. The quality improvement as a function of the added redundancy can then be quantified. Simulations show that adding moderate amounts of redundancy can result in a substantial quality improvement. The use of a network architecture supporting service differentiation has also been examined. A layered, lossless, and low delay codec is presented. The base layer is perceptually compressed, but when both layers are received the original audio stream is recovered. By transmitting the base layer at a high priority, the probability that at least this layer is received is increased and the use of regular Error Concealment (EC) is minimized. Moreover, the overhead from using the layering is very low when compared with regular lossless coding. Higher Order Ambisonics (HOA) is a naturally scalable multichannel format for wave field reproduction. In this work, both lossless and perceptual compression of this format is examined. Perceptual coding results in an error in the reproduced wave field which is very low in the sweet spot, but increases as a function of the distance from the center. For the lossless compression scheme, the focus is on maintaining the scalability and keeping a very low delay for both encoding and decoding. The compression of synthetic signals, and signals recorded with a spherical microphone array, is evaluated, and the results show that the total rate is signi cantly reduced by exploiting the inter-channel correlation. The perceived quality of packet loss distorted audio is also investigated. Effects of different packet loss processes resulting from simulations of two network architectures are compared using subjective testing. The test revealed that the less bursty loss process lead to higher perceived quality for the lowest packet loss ratio. Packet sizes were also considered in the simulations and it was found that using small packets may be bene cial. A new type of subjective test has also been used, where the users' acceptability for packet loss is evaluated. As expected, the acceptability decreases quickly as the packet loss ratio increases, and it is seen that music exposed to a packet loss ratio of 1.5% can no longer be said to have acceptable quality.