Representation of High Quality Spatial Audio

Solvang, Audun

dc.contributor.author	Solvang, Audun	nb_NO
dc.date.accessioned	2014-12-19T13:46:19Z
dc.date.accessioned	2015-12-22T11:44:49Z
dc.date.available	2014-12-19T13:46:19Z
dc.date.available	2015-12-22T11:44:49Z
dc.date.created	2011-02-11	nb_NO
dc.date.issued	2009	nb_NO
dc.identifier	397063	nb_NO
dc.identifier.isbn	978-82-471-1393-6 (printed ver.)	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/2370144
dc.description.abstract	Classical surround sound techniques are base don sweet spot listening while sound field approaches offer higher quality in that they can give reproduction over an extended area. The expenses are higher complexity in the creation and reproduction of the audio content and a high bit rate. The fact that we perceive the direct sound, early reflections and the late part of a room impulse response differently forms the basis for contemporary hybrid formats that can yield a lower bit rate. These formats are, however, based on classical surround sound techniques and are meant for sweet spot listening. This thesis looks into both sound field reproduction that can be used for direct sound reproduction and the reproduction of the late diffuse part of a room impulse response. Listeners are usually positioned in the horizontal plane, natural sound fields have sound sources mostly in the horizontal plane, the auditory system has the highest resolution in the horizontal plane and 2D reproduction has been most widely used. This thesis is therefore restricted to representation and reproduction in the horizontal plane. Higher order Ambisonics (HOA) is the sound field technique that is investigated here because it is fine grain scalable and therefore suitable for transmission over communication channels with varying bandwidths such as the internet. Furthermore, it can easily be down-mixed to different reproduction platforms which makes it a very flexible format. Quantization schemes for HOA and the spatial distribution of the resulting errors have been investigated which can seve as a basis for reducing the bit rate. It is found that a uniform allocation of bits across all channels leads to a spatially uniformly distributed quantization noise. The HOA representation error increased as the product of the wave number, k and the radius of the listening area, r, increases. The mean normalized error is, as a rule of thumb, about 0.04% (-14dB) when kr equals the order of the HOA representation and this is regarded the near perfect representation boundary. A coarse quantization can be employed without violating a total error of 0.04% within the near perfect reproduction region. Allocating zero bits at low frequencies for higher orders places the resulting noise outside the listening area. This suggests that there is a large potential for reducing the bit rate. Furthermore, it is shown that using a much higher number of loudspeakers than required by the truncation order leads to spectral coloration, diffuse localization and possible localization bias that varies over the reproduction region. These effects will typically occur when utilizing the scalable properties of the format. The quantization and the number of loudspeakers used for HOA reproduction have been examined under the assumption of anechoic conditions, uniformly distributed loudspeakers on a circle, both loudspeakers and the virtual source radiating plane waves and mode matched reproduction. The possibilities for including distance in the representation and reproduction of virtual sources, as well as compensation for the distance to the loudspeakers in 2D higher order Ambisonics, have also been investigated. It is found that the reproduction of spherical radiating sources is more erroneous than cylindrical and plane wave radiating sources. There is, however, on exception when the source is positioned on the radius of a circular array of loudspeakers radiating spherical waves. Furthermore, it is shown that line sources can be used as loudspeakers and virtual line sources can be positioned at any distance. These topics have been investigated analytically and by numerical simulations. The number of loudspeakers needed to reproduce the late diffuse part of a room impulse response has also been examined for listening room reproduction. This was investigated by a listening test. Contradictory to previous results for anechoic chambers the required number of loudspeakers is found to be smaller for low frequencies than high. This is probably a result of the listening room’s reverberation scrambling the phase of the sound field.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO
dc.relation.ispartofseries	Doktoravhandlinger ved NTNU, 1503-8181; 2009:13	nb_NO
dc.subject	audio representation	en_GB
dc.subject	audio reproduction
dc.subject	two-dimensional higher order Ambisonics
dc.subject	quantization
dc.subject	scalability
dc.subject	spectral coloration
dc.subject	two-dimensional near field compensation
dc.subject	cylindrical harmonics
dc.subject	spherical harmonics
dc.subject	reproduction of diffuse fields
dc.title	Representation of High Quality Spatial Audio	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO
dc.description.degree	PhD i elektronikk og telekommunikasjon	nb_NO
dc.description.degree	PhD in Electronics and Telecommunication

Tilhørende fil(er)

Filnavn:: 397063_FULLTEXT01.pdf
Størrelse:: 4.760Mb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2334]

Vis enkel innførsel