Speech Enhancement in the STFT Domain

Speech Enhancement in the STFT Domain PDF Author: Jacob Benesty
Publisher: Springer Science & Business Media
ISBN: 3642232507
Category : Technology & Engineering
Languages : en
Pages : 112

Get Book Here

Book Description
This work addresses this problem in the short-time Fourier transform (STFT) domain. We divide the general problem into five basic categories depending on the number of microphones being used and whether the interframe or interband correlation is considered. The first category deals with the single-channel problem where STFT coefficients at different frames and frequency bands are assumed to be independent. In this case, the noise reduction filter in each frequency band is basically a real gain. Since a gain does not improve the signal-to-noise ratio (SNR) for any given subband and frame, the noise reduction is basically achieved by liftering the subbands and frames that are less noisy while weighing down on those that are more noisy. The second category also concerns the single-channel problem. The difference is that now the interframe correlation is taken into account and a filter is applied in each subband instead of just a gain. The advantage of using the interframe correlation is that we can improve not only the long-time fullband SNR, but the frame-wise subband SNR as well. The third and fourth classes discuss the problem of multichannel noise reduction in the STFT domain with and without interframe correlation, respectively. In the last category, we consider the interband correlation in the design of the noise reduction filters. We illustrate the basic principle for the single-channel case as an example, while this concept can be generalized to other scenarios. In all categories, we propose different optimization cost functions from which we derive the optimal filters and we also define the performance measures that help analyzing them.

Speech Enhancement in the STFT Domain

Speech Enhancement in the STFT Domain PDF Author: Jacob Benesty
Publisher: Springer Science & Business Media
ISBN: 3642232507
Category : Technology & Engineering
Languages : en
Pages : 112

Get Book Here

Book Description
This work addresses this problem in the short-time Fourier transform (STFT) domain. We divide the general problem into five basic categories depending on the number of microphones being used and whether the interframe or interband correlation is considered. The first category deals with the single-channel problem where STFT coefficients at different frames and frequency bands are assumed to be independent. In this case, the noise reduction filter in each frequency band is basically a real gain. Since a gain does not improve the signal-to-noise ratio (SNR) for any given subband and frame, the noise reduction is basically achieved by liftering the subbands and frames that are less noisy while weighing down on those that are more noisy. The second category also concerns the single-channel problem. The difference is that now the interframe correlation is taken into account and a filter is applied in each subband instead of just a gain. The advantage of using the interframe correlation is that we can improve not only the long-time fullband SNR, but the frame-wise subband SNR as well. The third and fourth classes discuss the problem of multichannel noise reduction in the STFT domain with and without interframe correlation, respectively. In the last category, we consider the interband correlation in the design of the noise reduction filters. We illustrate the basic principle for the single-channel case as an example, while this concept can be generalized to other scenarios. In all categories, we propose different optimization cost functions from which we derive the optimal filters and we also define the performance measures that help analyzing them.

A Perspective on Single-channel Frequency-domain Speech Enhancement

A Perspective on Single-channel Frequency-domain Speech Enhancement PDF Author: Jacob Benesty
Publisher: Morgan & Claypool Publishers
ISBN: 1608456986
Category : Computers
Languages : en
Pages : 111

Get Book Here

Book Description
This book focuses on a class of single-channel noise reduction methods that are performed in the frequency domain via the short-time Fourier transform (STFT). The simplicity and relative effectiveness of this class of approaches make them the dominant choice in practical systems. Even though many popular algorithms have been proposed through more than four decades of continuous research, there are a number of critical areas where our understanding and capabilities still remain quite rudimentary, especially with respect to the relationship between noise reduction and speech distortion. All existing frequency-domain algorithms, no matter how they are developed, have one feature in common: the solution is eventually expressed as a gain function applied to the STFT of the noisy signal only in the current frame. As a result, the narrowband signal-to-noise ratio (SNR) cannot be improved, and any gains achieved in noise reduction on the fullband basis come with a price to pay, which is speech distortion. In this book, we present a new perspective on the problem by exploiting the difference between speech and typical noise in circularity and interframe self-correlation, which were ignored in the past. By gathering the STFT of the microphone signal of the current frame, its complex conjugate, and the STFTs in the previous frames, we construct several new, multiple-observation signal models similar to a microphone array system: there are multiple noisy speech observations, and their speech components are correlated but not completely coherent while their noise components are presumably uncorrelated. Therefore, the multichannel Wiener filter and the minimum variance distortionless response (MVDR) filter that were usually associated with microphone arrays will be developed for single-channel noise reduction in this book. This might instigate a paradigm shift geared toward speech distortionless noise reduction techniques.

Fundamentals of Speech Enhancement

Fundamentals of Speech Enhancement PDF Author: Jacob Benesty
Publisher: Springer
ISBN: 3319745247
Category : Technology & Engineering
Languages : en
Pages : 112

Get Book Here

Book Description
This book presents and develops several important concepts of speech enhancement in a simple but rigorous way. Many of the ideas are new; not only do they shed light on this old problem but they also offer valuable tips on how to improve on some well-known conventional approaches. The book unifies all aspects of speech enhancement, from single channel, multichannel, beamforming, time domain, frequency domain and time–frequency domain, to binaural in a clear and flexible framework. It starts with an exhaustive discussion on the fundamental best (linear and nonlinear) estimators, showing how they are connected to various important measures such as the coefficient of determination, the correlation coefficient, the conditional correlation coefficient, and the signal-to-noise ratio (SNR). It then goes on to show how to exploit these measures in order to derive all kinds of noise reduction algorithms that can offer an accurate and versatile compromise between noise reduction and speech distortion.

New Approaches for Speech Enhancement in the Short-Time Fourier Transform Domain

New Approaches for Speech Enhancement in the Short-Time Fourier Transform Domain PDF Author: Mahdi Parchami
Publisher:
ISBN:
Category :
Languages : en
Pages : 225

Get Book Here

Book Description
Speech enhancement aims at the improvement of speech quality by using various algorithms. A speech enhancement technique can be implemented as either a time domain or a transform domain method. In the transform domain speech enhancement, the spectrum of clean speech signal is estimated through the modification of noisy speech spectrum and then it is used to obtain the enhanced speech signal in the time domain. Among the existing transform domain methods in the literature, the short-time Fourier transform (STFT) processing has particularly served as the basis to implement most of the frequency domain methods. In general, speech enhancement methods in the STFT domain can be categorized into the estimators of complex discrete Fourier transform (DFT) coefficients and the estimators of real-valued short-time spectral amplitude (STSA). Due to the computational efficiency of the STSA estimation method and also its superior performance in most cases, as compared to the estimators of complex DFT coefficients, we focus mostly on the estimation of speech STSA throughout this work and aim at developing algorithms for noise reduction and reverberation suppression. First, we tackle the problem of additive noise reduction using the single-channel Bayesian STSA estimation method. In this respect, we present new schemes for the selection of Bayesian cost function parameters for a parametric STSA estimator, namely the W?-SA estimator, based on an initial estimate of the speech and also the properties of human auditory system. We further use the latter information to design an efficient flooring scheme for the gain function of the STSA estimator. Next, we apply the generalized Gaussian distribution (GGD) to theW?-SA estimator as the speech STSA prior and propose to choose its parameters according to noise spectral variance and a priori signal to noise ratio (SNR). The suggested STSA estimation schemes are able to provide further noise reduction as well as less speech distortion, as compared to the previous methods. Quality and noise reduction performance evaluations indicated the superiority of the proposed speech STSA estimation with respect to the previous estimators. Regarding the multi-channel counterpart of the STSA estimation method, first we generalize the proposed single-channel W?-SA estimator to the multi-channel case for spatially uncorrelated noise. It is shown that under the Bayesian framework, a straightforward extension from the single-channel to the multi-channel case can be performed by generalizing the STSA estimator parameters, i.e. ? and ?. Next, we develop Bayesian STSA estimators by taking advantage of speech spectral phase rather than only relying on the spectral amplitude of observations, in contrast to conventional methods. This contribution is presented for the multi-channel scenario with single-channel as a special case. Next, we aim at developing multi-channel STSA estimation under spatially correlated noise and derive a generic structure for the extension of a single-channel estimator to its multi-channel counterpart. It is shown that the derived multi-channel extension requires a proper estimate of the spatial correlation matrix of noise. Subsequently, we focus on the estimation of noise correlation matrix, that is not only important in the multi-channel STSA estimation scheme but also highly useful in different beamforming methods. Next, we aim at speech reverberation suppression in the STFT domain using the weighted prediction error (WPE) method. The original WPE method requires an estimate of the desired speech spectral variance along with reverberation prediction weights, leading to a sub-optimal strategy that alternatively estimates each of these two quantities. Also, similar to most other STFT based speech enhancement methods, the desired speech coefficients are assumed to be temporally independent, while this assumption is inaccurate. Taking these into account, first, we employ a suitable estimator for the speech spectral variance and integrate it into the estimation of the reverberation prediction weights. In addition to the performance advantage with respect to the previous versions of the WPE method, the presented approach provides a good reduction in implementation complexity. Next, we take into account the temporal correlation present in the STFT of the desired speech, namely the inter-frame correlation (IFC), and consider an approximate model where only the frames within each segment of speech are considered as correlated. Furthermore, an efficient method for the estimation of the underlying IFC matrix is developed based on the extension of the speech variance estimator proposed previously. The performance results reveal lower residual reverberation and higher overall quality provided by the proposed method. Finally, we focus on the problem of late reverberation suppression using the classic speech spectral enhancement method originally developed for additive noise reduction. As our main contribution, we propose a novel late reverberant spectral variance (LRSV) estimator which replaces the noise spectral variance in order to modify the gain function for reverberation suppression. The suggested approach employs a modified version of the WPE method in a model based smoothing scheme used for the estimation of the LRSV. According to the experiments, the proposed LRSV estimator outperforms the previous major methods considerably and scores the closest results to the theoretically true LRSV estimator. Particularly, in case of changing room impulse responses (RIRs) where other methods cannot follow the true LRSV estimator accurately, the suggested estimator is able to track true LRSV values and results in a smaller tracking error. We also target a few other aspects of the spectral enhancement method for reverberation suppression, which were explored before only for the purpose of noise reduction. These contributions include the estimation of signal to reverberant ratio (SRR) and the development of new schemes for the speech presence probability (SPP) and spectral gain flooring in the context of late reverberation suppression.

Speech Enhancement

Speech Enhancement PDF Author: Jacob Benesty
Publisher: Springer Science & Business Media
ISBN: 3540274898
Category : Technology & Engineering
Languages : en
Pages : 416

Get Book Here

Book Description
A strong reference on the problem of signal and speech enhancement, describing the newest developments in this exciting field. The general emphasis is on noise reduction, because of the large number of applications that can benefit from this technology.

Canonical Correlation Analysis in Speech Enhancement

Canonical Correlation Analysis in Speech Enhancement PDF Author: Jacob Benesty
Publisher: Springer
ISBN: 3319670204
Category : Technology & Engineering
Languages : en
Pages : 124

Get Book Here

Book Description
This book focuses on the application of canonical correlation analysis (CCA) to speech enhancement using the filtering approach. The authors explain how to derive different classes of time-domain and time-frequency-domain noise reduction filters, which are optimal from the CCA perspective for both single-channel and multichannel speech enhancement. Enhancement of noisy speech has been a challenging problem for many researchers over the past few decades and remains an active research area. Typically, speech enhancement algorithms operate in the short-time Fourier transform (STFT) domain, where the clean speech spectral coefficients are estimated using a multiplicative gain function. A filtering approach, which can be performed in the time domain or in the subband domain, obtains an estimate of the clean speech sample at every time instant or time-frequency bin by applying a filtering vector to the noisy speech vector. Compared to the multiplicative gain approach, the filtering approach more naturally takes into account the correlation of the speech signal in adjacent time frames. In this study, the authors pursue the filtering approach and show how to apply CCA to the speech enhancement problem. They also address the problem of adaptive beamforming from the CCA perspective, and show that the well-known Wiener and minimum variance distortionless response (MVDR) beamformers are particular cases of a general class of CCA-based adaptive beamformers.

Noise Reduction in Speech Processing

Noise Reduction in Speech Processing PDF Author: Jacob Benesty
Publisher: Springer Science & Business Media
ISBN: 364200296X
Category : Technology & Engineering
Languages : en
Pages : 236

Get Book Here

Book Description
Noise is everywhere and in most applications that are related to audio and speech, such as human-machine interfaces, hands-free communications, voice over IP (VoIP), hearing aids, teleconferencing/telepresence/telecollaboration systems, and so many others, the signal of interest (usually speech) that is picked up by a microphone is generally contaminated by noise. As a result, the microphone signal has to be cleaned up with digital signal processing tools before it is stored, analyzed, transmitted, or played out. This cleaning process is often called noise reduction and this topic has attracted a considerable amount of research and engineering attention for several decades. One of the objectives of this book is to present in a common framework an overview of the state of the art of noise reduction algorithms in the single-channel (one microphone) case. The focus is on the most useful approaches, i.e., filtering techniques (in different domains) and spectral enhancement methods. The other objective of Noise Reduction in Speech Processing is to derive all these well-known techniques in a rigorous way and prove many fundamental and intuitive results often taken for granted. This book is especially written for graduate students and research engineers who work on noise reduction for speech and audio applications and want to understand the subtle mechanisms behind each approach. Many new and interesting concepts are presented in this text that we hope the readers will find useful and inspiring.

Speech Enhancement

Speech Enhancement PDF Author: Shoji Makino
Publisher: Springer Science & Business Media
ISBN: 9783540240396
Category : Computers
Languages : en
Pages : 432

Get Book Here

Book Description
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be "cleaned" with digital signal processing tools before it is played out, transmitted, or stored. This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise reduction but also dereverberation and separation of independent signals. These topics are also covered in this book. However, the general emphasis is on noise reduction because of the large number of applications that can benefit from this technology. The goal of this book is to provide a strong reference for researchers, engineers, and graduate students who are interested in the problem of signal and speech enhancement. To do so, we invited well-known experts to contribute chapters covering the state of the art in this focused field.

Speech Processing in Modern Communication

Speech Processing in Modern Communication PDF Author: Israel Cohen
Publisher: Springer Science & Business Media
ISBN: 3642111300
Category : Technology & Engineering
Languages : en
Pages : 342

Get Book Here

Book Description
Modern communication devices, such as mobile phones, teleconferencing systems, VoIP, etc., are often used in noisy and reverberant environments. Therefore, signals picked up by the microphones from telecommunication devices contain not only the desired near-end speech signal, but also interferences such as the background noise, far-end echoes produced by the loudspeaker, and reverberations of the desired source. These interferences degrade the fidelity and intelligibility of the near-end speech in human-to-human telecommunications and decrease the performance of human-to-machine interfaces (i.e., automatic speech recognition systems). The proposed book deals with the fundamental challenges of speech processing in modern communication, including speech enhancement, interference suppression, acoustic echo cancellation, relative transfer function identification, source localization, dereverberation, and beamforming in reverberant environments. Enhancement of speech signals is necessary whenever the source signal is corrupted by noise. In highly non-stationary noise environments, noise transients, and interferences may be extremely annoying. Acoustic echo cancellation is used to eliminate the acoustic coupling between the loudspeaker and the microphone of a communication device. Identification of the relative transfer function between sensors in response to a desired speech signal enables to derive a reference noise signal for suppressing directional or coherent noise sources. Source localization, dereverberation, and beamforming in reverberant environments further enable to increase the intelligibility of the near-end speech signal.

Phase-based Speech Processing

Phase-based Speech Processing PDF Author: Parham Aarabi
Publisher: World Scientific
ISBN: 9812566120
Category : Computers
Languages : en
Pages : 153

Get Book Here

Book Description
This is the first book that takes a detailed look at the importance of phase in the design of speech processing systems. Phase, in comparison with amplitude, is often ignored for speech recognition applications. Thus, this book highlights some of the important ways in which the phase of speech signals can be utilized for sound localization, enhancement, and recognition.This book also discusses the state-of-the-art research in phase-based speech processing, starting from the basics of signal processing and recording, to single microphone speech recognition, the recognition of speech and the processing of speech by humans, as well as the importance of phase in human speech recognition and multi-microphone phase-based speech processing.