Speech Enhancement Based on Spectral Estimation from Higher-lag
Benjamin J.Shannon,Kuldip K.Paliwal and Climent Nadeu†
School of Engineering,Grifﬁth University
In this paper,we propose a unique approach to enhance speech signals that have been corrupted by non-stationary noises.This ap-proach is not based on a spectral subtraction algorithm,but on an algorithm that separates the speech signal and noise signal contri-butions in the autocorrelation domain.We call this technique the AR-HASE speech enhancement algorithm.
In this initial study,we evaluate the performance of the new algorithm using the average PESQ score computed from10 male utterances and10female utterances taken from the TIMIT database as a measure of speech quality.We test the algorithm using one broadband stationary noise and two non-stationary noises.We will show that the AR-HASE enhancement algorithm produces near transparent quality for clean speech,gives poor enhancement performance for broadband stationary noises,and gives signiﬁcantly enhanced quality for the two non-stationary noises.
Index Terms:speech enhancement,autocorrelation,impulsive noise.
Many of the state-of-the-art speech enhancement algorithms use the analysis-modiﬁcation-synthesis frameworkin their opera-tion.In this framework,the corrupted speech signal is broken up into short-time segments,which are transformed to the frequency domain where only the spectral magnitude is modiﬁed.The speech signal is then reconstructed with an inverse short-time Fourier transform followed by an overlap-add operation.This structure is used by the popular spectral subtraction algorithm,originally proposed by Bollin1979,and also by techniques related to Wienerﬁltering,such as Ephraim-Malah’s methodand all its more recent variants.
These spectral enhancement algorithms require an estimate of the noise spectrum,which can be obtained from non-speech seg-ments indicated by a voice activity detector or,alternatively,with a minimum statistics approach,i.e.by tracking spectral minima in each frequency band.In consequence,they are effective only when the noise signals are stationary or at least do not show rapidly varying statistical characteristics.The worst type of noise for these systems is when the noise signal is typically coincident with the †This work was performed while Climent Nadeu was on leave from the Signal Theory and Communications Department,Universitat Politec-
nica de Catalunya,08034Barcelona,Spain.
speech signal,and absent at other times.This situation,for exam-
ple,could arise with an impulsive noise.In this case,most of the
non-speech frames could be completely devoid of impulsive noise,
but the speech frames could contain a large amount of this noise.
To handle these situations,noise reduction techniques that oper-
ate intra-frame(within the current frame)are required;these tech-
niques cannot use the noise power spectrum estimate from other
In previous work,we have proposed a noise robust spectral estimation technique for short-time speech signals that operates
intra-frame.This method uses the periodic correlation property of
short-time speech signals and the autocorrelation domain to per-
form noise reduction.It is well known that the pitch period of
human speech is typically constrained to values between2ms and
12ms.This means that in the autocorrelation domain,we will
have large magnitude coefﬁcients at these periods.This property,
conversely,is generally not true for noise signals.By computing
a spectral estimate using only the higher-lag autocorrelation coef-
ﬁcients,we have a way of separating the speech and noise signal
without having to estimate the noise signal directly.We call this
method,Higher-lag Autocorrelation Spectral Estimation(HASE)
The HASE method was motivated by the large volume of pre-vious work on noise robust Automatic Speech Recognition ASR
feature extraction based on autocorrelation domain processing
.This method has been successfully applied to the
noise robust ASR problem,particularly where the noise signal had
rapidly changing characteristics.The goal of ASR feature extrac-
tion is to produce features that have a low dimensionality,are in-
sensitive to speaker and environmental changes and are effective
in discriminating the linguistic units.These goals have little in
common with the goals of speech enhancement.
In this paper,we investigate the HASE algorithm for speech enhancement.We show that this algorithm has some inherent lim-
itations for enhancement applications.We propose to overcome
these limitations by using an Auto-Regressive(AR)model of high
order.We refer to this extended HASE algorithm as the AR-HASE
algorithm.It is our aim in this work to explore the potential of this
technique for the enhancement of speech signals corrupted by both
stationary and non-stationary disturbances.
2.Speech Enhancement using Higher-lag
Autocorrelation Spectral Estimation
A brief description of the previously proposed Higher-lag Auto-
correlation Spectral Estimation(HASE)technique proceeds as fol-
lows.The short-time speech segment(approx.32ms)is
ﬁrst 1427September 17-21, Pittsburgh, Pennsylvania
INTERSPEECH 2006 - ICSLP