Bandwidth Extension of Telephony Speech
MetadataShow full item record
The public switched telephone network (PSTN) restricts the acoustic bandwidth of telephonyspeech to less than 4 kHz. For compatibility with analog telephone networks, a 0.3 − 3.4 kHz passband is common. This bandwidth reduction has a signiﬁcant impact on perceived quality, andis especially noticeable and even distracting when PSTN users call into, e.g., video conferencingsystems in which the other participants may use wideband (50 − 7k Hz) speech codecs. To reducethe gap in quality, one may attempt to resynthesize the missing spectrum. Techniques for thisare referred to as bandwidth extension (BWE).For this thesis, two systems for BWE of speech into the high band (f ≥ 3.4 kHz) were imple-mented in Matlab, based on systems proposed in literature. The extension was done accordingto the linear source-ﬁlter model for speech, meaning estimation of the excitation and spectralenvelope from the narrowband (0.3 − 3.4 kHz) signal were done separately.BWE System 1 made use of linear prediction (LP) analysis in combination with modulation forextension of the excitation. Its wideband spectral envelope estimation was primarily based onlinear prediction cepstral coefficients (LPCC) and artiﬁcial neural networks (ANN).BWE System 2 made use of bandpass-modulation of Gaussian noise (BP-MGN) for extension ofthe excitation. Its wideband spectral envelope estimation was based on Mel-frequency cepstralcoefficients (MFCC) and Gaussian mixture modelling (GMM), which was the most complexestimation method of the two systems.Objective analysis of the two systems? spectral envelope estimation and informal listening testswere carried out. These analyses showed that BWE System 1 performed best, though bothsystems improved the perceived quality. BWE systems based on LP analysis therefore seem tobe preferrable due to the superior excitation, and efficient computation of the cepstrum.