Improving Neural Machine Translation of Languages with Little Data and Rich Morphology

Improving Neural Machine Translation of Languages with Little Data and Rich Morphology PDF Author: Prajit Dhar
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description

Improving Neural Machine Translation of Languages with Little Data and Rich Morphology

Improving Neural Machine Translation of Languages with Little Data and Rich Morphology PDF Author: Prajit Dhar
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description


Improving Neural Machine Translation for Low-resource Languages

Improving Neural Machine Translation for Low-resource Languages PDF Author: Toan Q. Nguyen
Publisher:
ISBN:
Category :
Languages : en
Pages : 89

Get Book Here

Book Description


Progress in Machine Translation

Progress in Machine Translation PDF Author: Sergei Nirenburg
Publisher: IOS Press
ISBN: 9789051990744
Category : Computers
Languages : en
Pages : 338

Get Book Here

Book Description


Neural Machine Translation

Neural Machine Translation PDF Author: Philipp Koehn
Publisher: Cambridge University Press
ISBN: 1108497322
Category : Computers
Languages : en
Pages : 409

Get Book Here

Book Description
Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Machine Translation and Transliteration involving Related, Low-resource Languages

Machine Translation and Transliteration involving Related, Low-resource Languages PDF Author: Anoop Kunchukuttan
Publisher: CRC Press
ISBN: 100042166X
Category : Computers
Languages : en
Pages : 220

Get Book Here

Book Description
Machine Translation and Transliteration involving Related, Low-resource Languages discusses an important aspect of natural language processing that has received lesser attention: translation and transliteration involving related languages in a low-resource setting. This is a very relevant real-world scenario for people living in neighbouring states/provinces/countries who speak similar languages and need to communicate with each other, but training data to build supporting MT systems is limited. The book discusses different characteristics of related languages with rich examples and draws connections between two problems: translation for related languages and transliteration. It shows how linguistic similarities can be utilized to learn MT systems for related languages with limited data. It comprehensively discusses the use of subword-level models and multilinguality to utilize these linguistic similarities. The second part of the book explores methods for machine transliteration involving related languages based on multilingual and unsupervised approaches. Through extensive experiments over a wide variety of languages, the efficacy of these methods is established. Features Novel methods for machine translation and transliteration between related languages, supported with experiments on a wide variety of languages. An overview of past literature on machine translation for related languages. A case study about machine translation for related languages between 10 major languages from India, which is one of the most linguistically diverse country in the world. The book presents important concepts and methods for machine translation involving related languages. In general, it serves as a good reference to NLP for related languages. It is intended for students, researchers and professionals interested in Machine Translation, Translation Studies, Multilingual Computing Machine and Natural Language Processing. It can be used as reference reading for courses in NLP and machine translation. Anoop Kunchukuttan is a Senior Applied Researcher at Microsoft India. His research spans various areas on multilingual and low-resource NLP. Pushpak Bhattacharyya is a Professor at the Department of Computer Science, IIT Bombay. His research areas are Natural Language Processing, Machine Learning and AI (NLP-ML-AI). Prof. Bhattacharyya has published more than 350 research papers in various areas of NLP.

Machine Translation of Morphologically Rich Languages Using Deep Neural Networks

Machine Translation of Morphologically Rich Languages Using Deep Neural Networks PDF Author: Peyman Passban
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
This thesis addresses some of the challenges of translating morphologically rich languages (MRLs). Words in MRLs have more complex structures than those in other languages, so that a word can be viewed as a hierarchical structure with several internal subunits. Accordingly, word-based models in which words are treated as atomic units are not suitable for this set of languages. As a commonly used and eff ective solution, morphological decomposition is applied to segment words into atomic and meaning-preserving units, but this raises other types of problems some of which we study here. We mainly use neural networks (NNs) to perform machine translation (MT) in our research and study their diff erent properties. However, our research is not limited to neural models alone as we also consider some of the difficulties of conventional MT methods. First we try to model morphologically complex words (MCWs) and provide better word-level representations. Words are symbolic concepts which are represented numerically in order to be used in NNs. Our first goal is to tackle this problem and find the best representation for MCWs. In the next step we focus on language modeling (LM) and work at the sentence level. We propose new morpheme-segmentation models by which we finetune existing LMs for MRLs. In this part of our research we try to find the most efficient neural language model for MRLs. After providing word- and sentence-level neural information in the first two steps, we try to use such information to enhance the translation quality in the statistical machine translation (SMT) pipeline using several diff erent models. Accordingly, the main goal in this part is to find methods by which deep neural networks (DNNs) can improve SMT. One of the main interests of the thesis is to study neural machine translation (NMT) engines from diff erent perspectives, and finetune them to work with MRLs. In the last step we target this problem and perform end-to-end sequence modeling via NN-based models. NMT engines have recently improved significantly and perform as well as state-of-the-art systems, but still have serious problems with morphologically complex constituents. This shortcoming of NMT is studied in two separate chapters in the thesis, where in one chapter we investigate the impact of diff erent non-linguistic morpheme-segmentation models on the NMT pipeline, and in the other one we benefit from a linguistically motivated morphological analyzer and propose a novel neural architecture particularly for translating from MRLs. Our overall goal for this part of the research is to find the most suitable neural architecture to translate MRLs. We evaluated our models on diff erent MRLs such as Czech, Farsi, German, Russian, and Turkish, and observed significant improvements. The main goal targeted in this research was to incorporate morphological information into MT and define architectures which are able to model the complex nature of MRLs. The results obtained from our experimental studies confirm that we were able to achieve our goal.

Enhancing Neural Machine Translation of Low-resource Languages

Enhancing Neural Machine Translation of Low-resource Languages PDF Author: Séamus Lankford
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
In the current machine translation (MT) landscape, the Transformer architecture stands as the gold standard, especially for high-resource language pairs. This research delves into its efficacy for low-resource language pairs including both the English↔Irish and English↔Marathi language pairs. Notably, the study identifies the optimal hyperparameters and subword model type to significantly improve the translation quality of Transformer models for low-resource language pairs. The scarcity of parallel datasets for low-resource languages can hinder MT development. To address this, we developed gaHealth, the first bilingual corpus of health data for the Irish language. Focusing on the health domain, models developed using this in-domain dataset exhibited very significant improvements in BLEU score when compared with models from the LoResMT2021 Shared Task. A subsequent human evaluation using the multidimensional quality metrics error taxonomy showcased the superior performance of the Transformer system in reducing both accuracy and fluency errors compared to an RNN-based counterpart. Furthermore, this thesis introduces adaptNMT and adaptMLLM, two open-source applications streamlined for the development, fine-tuning, and deployment of neural machine translation models. These tools considerably simplify the setup and evaluation process, making MT more accessible to both developers and translators. Notably, adaptNMT, grounded in the OpenNMT ecosystem, promotes eco-friendly natural language processing research by highlighting the environmental footprint of model development. Fine-tuning of MLLMs by adaptMLLM demonstrated advancements in translation performance for two low-resource language pairs: English-Irish and English-Marathi, compared to baselines from the LoResMT2021 Shared Task.

Locative Alternation

Locative Alternation PDF Author: Seizi Iwata
Publisher: John Benjamins Publishing
ISBN: 9027291047
Category : Language Arts & Disciplines
Languages : en
Pages : 258

Get Book Here

Book Description
The aim of the present volume is two-fold: to give a coherent account of the locative alternation in English, and to develop a constructional theory that overcomes a number of problems in earlier constructional accounts. The lexical-constructional account proposed here is characterized by two main features. On the one hand, it emphasizes the need for a detailed examination of verb meanings. On the other, it introduces lower-level constructions such as verb-class-specific constructions and verb-specific constructions, and makes full use of these lower-level constructions in accounting for alternation phenomena. Rather than being a completely new version of construction grammar, the proposed lexical-constructional account is an automatic consequence of the basic tenet of constructional approaches as being usage-based.

Improving Neural Machine Translation Models with Monolingual Data

Improving Neural Machine Translation Models with Monolingual Data PDF Author: Rico Sennrich
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Pivot-based Statistical Machine Translation for Morphologically Rich Languages

Pivot-based Statistical Machine Translation for Morphologically Rich Languages PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
One method is based on hand-crafted rules relying on our knowledge of the source and target languages; while in the other method, the morphology constraints are induced from available parallel data between the source and target languages which we also use to build a direct translation model. We then combine both the pivot and direct models to achieve better coverage and overall translation quality. Using induced morphology constraints outperformed the handcrafted rules and improved over our best model from all previous approaches by 0.6 BLEU points (7.2/6.7 BLEU points from the direct and pivot baselines respectively). Finally, we introduce applying smart techniques to combine pivot and direct models. We show that smart selective combination can lead to a large reduction of the pivot model without affecting the performance and in some cases improving it.