Speech Enhancement Based on Spectral Estimation from Higher-lag Autocorrelation

Speech Enhancement Based on Spectral Estimation from Higher-lag


Benjamin J.Shannon,Kuldip K.Paliwal and Climent Nadeu†

School of Engineering,Griffith University




In this paper,we propose a unique approach to enhance speech signals that have been corrupted by non-stationary noises.This ap-proach is not based on a spectral subtraction algorithm,but on an algorithm that separates the speech signal and noise signal contri-butions in the autocorrelation domain.We call this technique the AR-HASE speech enhancement algorithm.

In this initial study,we evaluate the performance of the new algorithm using the average PESQ score computed from10 male utterances and10female utterances taken from the TIMIT database as a measure of speech quality.We test the algorithm using one broadband stationary noise and two non-stationary noises.We will show that the AR-HASE enhancement algorithm produces near transparent quality for clean speech,gives poor enhancement performance for broadband stationary noises,and gives significantly enhanced quality for the two non-stationary noises.

Index Terms:speech enhancement,autocorrelation,impulsive noise.


Many of the state-of-the-art speech enhancement algorithms use the analysis-modification-synthesis framework[1]in their opera-tion.In this framework,the corrupted speech signal is broken up into short-time segments,which are transformed to the frequency domain where only the spectral magnitude is modified.The speech signal is then reconstructed with an inverse short-time Fourier transform followed by an overlap-add operation.This structure is used by the popular spectral subtraction algorithm,originally proposed by Boll[2]in1979,and also by techniques related to Wienerfiltering,such as Ephraim-Malah’s method[3]and all its more recent variants.

These spectral enhancement algorithms require an estimate of the noise spectrum,which can be obtained from non-speech seg-ments indicated by a voice activity detector or,alternatively,with a minimum statistics approach[4],i.e.by tracking spectral minima in each frequency band.In consequence,they are effective only when the noise signals are stationary or at least do not show rapidly varying statistical characteristics.The worst type of noise for these systems is when the noise signal is typically coincident with the †This work was performed while Climent Nadeu was on leave from the Signal Theory and Communications Department,Universitat Politec-

nica de Catalunya,08034Barcelona,Spain.

speech signal,and absent at other times.This situation,for exam-

ple,could arise with an impulsive noise.In this case,most of the

non-speech frames could be completely devoid of impulsive noise,

but the speech frames could contain a large amount of this noise.

To handle these situations,noise reduction techniques that oper-

ate intra-frame(within the current frame)are required;these tech-

niques cannot use the noise power spectrum estimate from other

non-speech frames.

In previous work,we have proposed a noise robust spectral estimation technique for short-time speech signals that operates

intra-frame.This method uses the periodic correlation property of

short-time speech signals and the autocorrelation domain to per-

form noise reduction.It is well known that the pitch period of

human speech is typically constrained to values between2ms and

12ms.This means that in the autocorrelation domain,we will

have large magnitude coefficients at these periods.This property,

conversely,is generally not true for noise signals.By computing

a spectral estimate using only the higher-lag autocorrelation coef-

ficients,we have a way of separating the speech and noise signal

without having to estimate the noise signal directly.We call this

method,Higher-lag Autocorrelation Spectral Estimation(HASE)


The HASE method was motivated by the large volume of pre-vious work on noise robust Automatic Speech Recognition ASR

feature extraction based on autocorrelation domain processing[7]

[8][9][10].This method has been successfully applied to the

noise robust ASR problem,particularly where the noise signal had

rapidly changing characteristics.The goal of ASR feature extrac-

tion is to produce features that have a low dimensionality,are in-

sensitive to speaker and environmental changes and are effective

in discriminating the linguistic units.These goals have little in

common with the goals of speech enhancement.

In this paper,we investigate the HASE algorithm for speech enhancement.We show that this algorithm has some inherent lim-

itations for enhancement applications.We propose to overcome

these limitations by using an Auto-Regressive(AR)model of high

order.We refer to this extended HASE algorithm as the AR-HASE

algorithm.It is our aim in this work to explore the potential of this

technique for the enhancement of speech signals corrupted by both

stationary and non-stationary disturbances.

2.Speech Enhancement using Higher-lag

Autocorrelation Spectral Estimation

A brief description of the previously proposed Higher-lag Auto-

correlation Spectral Estimation(HASE)technique proceeds as fol-

lows.The short-time speech segment(approx.32ms)is

Speech Enhancement Based on Spectral Estimation from Higher-lag Autocorrelation

first 1427September 17-21, Pittsburgh, Pennsylvania