Lossy compression of speech using perceptual criteria

O'Donnell, Michael (1998) Lossy compression of speech using perceptual criteria. Doctoral thesis, University of Central Lancashire.

[thumbnail of Thesis document] PDF (Thesis document) - Submitted Version
Restricted to Repository staff only
Available under License Creative Commons Attribution Non-commercial Share Alike.

5MB

Abstract

The research contained in this thesis provides an investigation into a new method of minimising the perceptual differences when encoding digitised speech. An application of the perceptual criteria is described in the context of a codebook encoding
methodology
Some of the background studies covered aspects of psychoacoustics, in particular the effects of the human outer, middle and inner ear. Models approximating each region of the ear are utilised and concatenated into a single overall auditory response path model.
As the objective of the research is to encode and decode speech waveforms, some study into how speech is produced and the classification of speech sounds is required. From this there is a description of a basic speech production model which is modelled as a digital filter.
A review of the main categories for coding schemes that are currently employed is presented along with commonly used coding methods. In particular the codebook coding method is reviewed in sufficient detail to contrast with the new coding method.
The development of a new perceptual minimisation criterion which relies on dual application of the auditory response path model on the original and reconstructed speech waveforms is described. In this the ordering of eodebook searches, the frequency spectrum used as the search target, windowing functions with durations and placement are all analysed to determine the optimum encoder design. Also described are a number of prospective gain algorithms which cover both time and frequency domain implementations.
A new encoder is constructed which fully integrates the new perceptual criterion into the minimisation of the original and reconstructed speech waveforms. In the minimisation no part of the traditional encoder method is used, however both methods use a similar technique for determining gain factors. Speech derived from both encoders was subjectively assessed by a number of untrained, independent listeners.
The results presented show that both methods are comparable but there is a slight preference towards the traditional encoder. A measure of the complexity indicated that the new minimisation method is also more complex than the traditional encoder.


Repository Staff Only: item control page