Implementation and Evaluation of Gated Recurrent Unit for Speech Separation and Speech Enhancement

Implementation and Evaluation of Gated Recurrent Unit for Speech Separation and Speech Enhancement PDF Author: Sagar Shah
Publisher:
ISBN: 9781088327920
Category : Biomedical engineering
Languages : en
Pages : 91

Get Book Here

Book Description
Hearing aids, automatic speech recognition (ASR) and many other communication systems work well when there is just one sound source with almost no echo, but their performance degrades in situations where more speakers are talking simultaneously or the reverberation is high. Speech separation and speech enhancement are core problems in the field of audio signal processing. Humans are remarkably capable of focusing their auditory attention on a single sound source within a noisy environment, by de-emphasizing all other voices and interferences in surroundings. This capability comes naturally to us humans. However, speech separation remains a significant challenge for computers. It is challenging for the following reasons: the wide variety of sound type, different mixing environment, and the unclear procedure to distinguish sources, especially for similar sounds. Also, perceiving speech in low signal/noise (SNR) conditions is hard for hearing-impaired listeners. Therefore, the motivation is to advance the speech separation algorithms to improve the intelligibility of noisy speech. Latest technologies aim to empower machines with similar abilities. Recently, the deep neural network methods achieved impressive successes in various problems, including speech enhancement, which the task to separate the clean speech of the noise mixture. Due to the advances in deep learning, speech separation can be viewed as a classification problem and treated as a supervised learning problem. Three main components of speech separation or speech enhancement using deep learning methods are acoustic features, learning machines, and training targets. This work aims to implement a single-channel speech separation and enhancement algorithm utilizing machine learning, deep neural networks (DNNs). An extensive set of speech from different speakers and noise data is collected to train a neural network model that predicts time-frequency masks from noisy and mixture speech signals. The algorithm is tested using various noises and combinations of different speakers. Its performance is evaluated in terms of speech quality and intelligibility. In this thesis, I am proposing a variant of the recurrent neural network, which is GRU (gated recurrent unit) for the speech separation and speech enhancement task. It is a simpler model than the LSTM (long short-term memory), which is used now for the task of speech enhancement and speech separation, consisting of a smaller number of parameters and matching the performance of the speech separation and speech enhancement of LSTM networks.

Implementation and Evaluation of Gated Recurrent Unit for Speech Separation and Speech Enhancement

Implementation and Evaluation of Gated Recurrent Unit for Speech Separation and Speech Enhancement PDF Author: Sagar Shah
Publisher:
ISBN: 9781088327920
Category : Biomedical engineering
Languages : en
Pages : 91

Get Book Here

Book Description
Hearing aids, automatic speech recognition (ASR) and many other communication systems work well when there is just one sound source with almost no echo, but their performance degrades in situations where more speakers are talking simultaneously or the reverberation is high. Speech separation and speech enhancement are core problems in the field of audio signal processing. Humans are remarkably capable of focusing their auditory attention on a single sound source within a noisy environment, by de-emphasizing all other voices and interferences in surroundings. This capability comes naturally to us humans. However, speech separation remains a significant challenge for computers. It is challenging for the following reasons: the wide variety of sound type, different mixing environment, and the unclear procedure to distinguish sources, especially for similar sounds. Also, perceiving speech in low signal/noise (SNR) conditions is hard for hearing-impaired listeners. Therefore, the motivation is to advance the speech separation algorithms to improve the intelligibility of noisy speech. Latest technologies aim to empower machines with similar abilities. Recently, the deep neural network methods achieved impressive successes in various problems, including speech enhancement, which the task to separate the clean speech of the noise mixture. Due to the advances in deep learning, speech separation can be viewed as a classification problem and treated as a supervised learning problem. Three main components of speech separation or speech enhancement using deep learning methods are acoustic features, learning machines, and training targets. This work aims to implement a single-channel speech separation and enhancement algorithm utilizing machine learning, deep neural networks (DNNs). An extensive set of speech from different speakers and noise data is collected to train a neural network model that predicts time-frequency masks from noisy and mixture speech signals. The algorithm is tested using various noises and combinations of different speakers. Its performance is evaluated in terms of speech quality and intelligibility. In this thesis, I am proposing a variant of the recurrent neural network, which is GRU (gated recurrent unit) for the speech separation and speech enhancement task. It is a simpler model than the LSTM (long short-term memory), which is used now for the task of speech enhancement and speech separation, consisting of a smaller number of parameters and matching the performance of the speech separation and speech enhancement of LSTM networks.

Speech Enhancement

Speech Enhancement PDF Author: Shoji Makino
Publisher: Springer Science & Business Media
ISBN: 9783540240396
Category : Computers
Languages : en
Pages : 432

Get Book Here

Book Description
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be "cleaned" with digital signal processing tools before it is played out, transmitted, or stored. This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise reduction but also dereverberation and separation of independent signals. These topics are also covered in this book. However, the general emphasis is on noise reduction because of the large number of applications that can benefit from this technology. The goal of this book is to provide a strong reference for researchers, engineers, and graduate students who are interested in the problem of signal and speech enhancement. To do so, we invited well-known experts to contribute chapters covering the state of the art in this focused field.

Speech Dereverberation

Speech Dereverberation PDF Author: Patrick A. Naylor
Publisher: Springer Science & Business Media
ISBN: 1849960569
Category : Technology & Engineering
Languages : en
Pages : 388

Get Book Here

Book Description
Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation. Speech Dereverberation presents current approaches to the problem of reverberation. It provides a review of topics in room acoustics and also describes performance measures for dereverberation. The algorithms are then explained with mathematical analysis and examples that enable the reader to see the strengths and weaknesses of the various techniques, as well as giving an understanding of the questions still to be addressed. Techniques rooted in speech enhancement are included, in addition to a treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a generalization of the signal processing for a range of analysis and enhancement techniques. Speech Dereverberation is suitable for students at masters and doctoral level, as well as established researchers.

Robust Automatic Speech Recognition

Robust Automatic Speech Recognition PDF Author: Jinyu Li
Publisher: Academic Press
ISBN: 0128026162
Category : Technology & Engineering
Languages : en
Pages : 308

Get Book Here

Book Description
Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications.The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided.The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

Deep Learning Applications for Cyber Security

Deep Learning Applications for Cyber Security PDF Author: Mamoun Alazab
Publisher: Springer
ISBN: 3030130576
Category : Computers
Languages : en
Pages : 246

Get Book Here

Book Description
Cybercrime remains a growing challenge in terms of security and privacy practices. Working together, deep learning and cyber security experts have recently made significant advances in the fields of intrusion detection, malicious code analysis and forensic identification. This book addresses questions of how deep learning methods can be used to advance cyber security objectives, including detection, modeling, monitoring and analysis of as well as defense against various threats to sensitive data and security systems. Filling an important gap between deep learning and cyber security communities, it discusses topics covering a wide range of modern and practical deep learning techniques, frameworks and development tools to enable readers to engage with the cutting-edge research across various aspects of cyber security. The book focuses on mature and proven techniques, and provides ample examples to help readers grasp the key points.

Speech Enhancement

Speech Enhancement PDF Author: Philipos C. Loizou
Publisher: CRC Press
ISBN: 1466599227
Category : Technology & Engineering
Languages : en
Pages : 715

Get Book Here

Book Description
With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic pr

New Era for Robust Speech Recognition

New Era for Robust Speech Recognition PDF Author: Shinji Watanabe
Publisher: Springer
ISBN: 331964680X
Category : Computers
Languages : en
Pages : 433

Get Book Here

Book Description
This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.

Handbook on Array Processing and Sensor Networks

Handbook on Array Processing and Sensor Networks PDF Author: Simon Haykin
Publisher: John Wiley & Sons
ISBN: 9780470487051
Category : Science
Languages : en
Pages : 924

Get Book Here

Book Description
A handbook on recent advancements and the state of the art in array processing and sensor Networks Handbook on Array Processing and Sensor Networks provides readers with a collection of tutorial articles contributed by world-renowned experts on recent advancements and the state of the art in array processing and sensor networks. Focusing on fundamental principles as well as applications, the handbook provides exhaustive coverage of: wavelets; spatial spectrum estimation; MIMO radio propagation; robustness issues in sensor array processing; wireless communications and sensing in multi-path environments using multi-antenna transceivers; implicit training and array processing for digital communications systems; unitary design of radar waveform diversity sets; acoustic array processing for speech enhancement; acoustic beamforming for hearing aid applications; undetermined blind source separation using acoustic arrays; array processing in astronomy; digital 3D/4D ultrasound imaging technology; self-localization of sensor networks; multi-target tracking and classification in collaborative sensor networks via sequential Monte Carlo; energy-efficient decentralized estimation; sensor data fusion with application to multi-target tracking; distributed algorithms in sensor networks; cooperative communications; distributed source coding; network coding for sensor networks; information-theoretic studies of wireless networks; distributed adaptive learning mechanisms; routing for statistical inference in sensor networks; spectrum estimation in cognitive radios; nonparametric techniques for pedestrian tracking in wireless local area networks; signal processing and networking via the theory of global games; biochemical transport modeling, estimation, and detection in realistic environments; and security and privacy for sensor networks. Handbook on Array Processing and Sensor Networks is the first book of its kind and will appeal to researchers, professors, and graduate students in array processing, sensor networks, advanced signal processing, and networking.

Speech & Language Processing

Speech & Language Processing PDF Author: Dan Jurafsky
Publisher: Pearson Education India
ISBN: 9788131716724
Category :
Languages : en
Pages : 912

Get Book Here

Book Description


Speech Enhancement in the STFT Domain

Speech Enhancement in the STFT Domain PDF Author: Jacob Benesty
Publisher: Springer Science & Business Media
ISBN: 3642232507
Category : Technology & Engineering
Languages : en
Pages : 112

Get Book Here

Book Description
This work addresses this problem in the short-time Fourier transform (STFT) domain. We divide the general problem into five basic categories depending on the number of microphones being used and whether the interframe or interband correlation is considered. The first category deals with the single-channel problem where STFT coefficients at different frames and frequency bands are assumed to be independent. In this case, the noise reduction filter in each frequency band is basically a real gain. Since a gain does not improve the signal-to-noise ratio (SNR) for any given subband and frame, the noise reduction is basically achieved by liftering the subbands and frames that are less noisy while weighing down on those that are more noisy. The second category also concerns the single-channel problem. The difference is that now the interframe correlation is taken into account and a filter is applied in each subband instead of just a gain. The advantage of using the interframe correlation is that we can improve not only the long-time fullband SNR, but the frame-wise subband SNR as well. The third and fourth classes discuss the problem of multichannel noise reduction in the STFT domain with and without interframe correlation, respectively. In the last category, we consider the interband correlation in the design of the noise reduction filters. We illustrate the basic principle for the single-channel case as an example, while this concept can be generalized to other scenarios. In all categories, we propose different optimization cost functions from which we derive the optimal filters and we also define the performance measures that help analyzing them.