Representation of High Quality Spatial Audio
MetadataVis full innførsel
Classical surround sound techniques are base don sweet spot listening while sound field approaches offer higher quality in that they can give reproduction over an extended area. The expenses are higher complexity in the creation and reproduction of the audio content and a high bit rate. The fact that we perceive the direct sound, early reflections and the late part of a room impulse response differently forms the basis for contemporary hybrid formats that can yield a lower bit rate. These formats are, however, based on classical surround sound techniques and are meant for sweet spot listening. This thesis looks into both sound field reproduction that can be used for direct sound reproduction and the reproduction of the late diffuse part of a room impulse response. Listeners are usually positioned in the horizontal plane, natural sound fields have sound sources mostly in the horizontal plane, the auditory system has the highest resolution in the horizontal plane and 2D reproduction has been most widely used. This thesis is therefore restricted to representation and reproduction in the horizontal plane. Higher order Ambisonics (HOA) is the sound field technique that is investigated here because it is fine grain scalable and therefore suitable for transmission over communication channels with varying bandwidths such as the internet. Furthermore, it can easily be down-mixed to different reproduction platforms which makes it a very flexible format. Quantization schemes for HOA and the spatial distribution of the resulting errors have been investigated which can seve as a basis for reducing the bit rate. It is found that a uniform allocation of bits across all channels leads to a spatially uniformly distributed quantization noise. The HOA representation error increased as the product of the wave number, k and the radius of the listening area, r, increases. The mean normalized error is, as a rule of thumb, about 0.04% (-14dB) when kr equals the order of the HOA representation and this is regarded the near perfect representation boundary. A coarse quantization can be employed without violating a total error of 0.04% within the near perfect reproduction region. Allocating zero bits at low frequencies for higher orders places the resulting noise outside the listening area. This suggests that there is a large potential for reducing the bit rate. Furthermore, it is shown that using a much higher number of loudspeakers than required by the truncation order leads to spectral coloration, diffuse localization and possible localization bias that varies over the reproduction region. These effects will typically occur when utilizing the scalable properties of the format. The quantization and the number of loudspeakers used for HOA reproduction have been examined under the assumption of anechoic conditions, uniformly distributed loudspeakers on a circle, both loudspeakers and the virtual source radiating plane waves and mode matched reproduction. The possibilities for including distance in the representation and reproduction of virtual sources, as well as compensation for the distance to the loudspeakers in 2D higher order Ambisonics, have also been investigated. It is found that the reproduction of spherical radiating sources is more erroneous than cylindrical and plane wave radiating sources. There is, however, on exception when the source is positioned on the radius of a circular array of loudspeakers radiating spherical waves. Furthermore, it is shown that line sources can be used as loudspeakers and virtual line sources can be positioned at any distance. These topics have been investigated analytically and by numerical simulations. The number of loudspeakers needed to reproduce the late diffuse part of a room impulse response has also been examined for listening room reproduction. This was investigated by a listening test. Contradictory to previous results for anechoic chambers the required number of loudspeakers is found to be smaller for low frequencies than high. This is probably a result of the listening room’s reverberation scrambling the phase of the sound field.