Author: Serge Sharoff
Publisher: Springer Science & Business Media
ISBN: 3642201288
Category : Computers
Languages : en
Pages : 333
Book Description
The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.
Building and Using Comparable Corpora
Author: Serge Sharoff
Publisher: Springer Science & Business Media
ISBN: 3642201288
Category : Computers
Languages : en
Pages : 333
Book Description
The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.
Publisher: Springer Science & Business Media
ISBN: 3642201288
Category : Computers
Languages : en
Pages : 333
Book Description
The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.
Using Comparable Corpora for Under-resourced Areas of Machine Translation
Author: Inguna Skadina
Publisher:
ISBN: 9783319990057
Category : Corpora (Linguistics)
Languages : en
Pages : 323
Book Description
This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.
Publisher:
ISBN: 9783319990057
Category : Corpora (Linguistics)
Languages : en
Pages : 323
Book Description
This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.
Parallel Corpora for Contrastive and Translation Studies
Author: Irene Doval
Publisher: John Benjamins Publishing Company
ISBN: 9027262845
Category : Language Arts & Disciplines
Languages : en
Pages : 313
Book Description
This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.
Publisher: John Benjamins Publishing Company
ISBN: 9027262845
Category : Language Arts & Disciplines
Languages : en
Pages : 313
Book Description
This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.
Translation-Driven Corpora
Author: Federico Zanettin
Publisher: Routledge
ISBN: 1317639847
Category : Language Arts & Disciplines
Languages : en
Pages : 209
Book Description
Electronic texts and text analysis tools have opened up a wealth of opportunities to higher education and language service providers, but learning to use these resources continues to pose challenges to scholars and professionals alike. Translation-Driven Corpora aims to introduce readers to corpus tools and methods which may be used in translation research and practice. Each chapter focuses on specific aspects of corpus creation and use. An introduction to corpora and overview of applications of corpus linguistics methodologies to translation studies is followed by a discussion of corpus design and acquisition. Different stages and tools involved in corpus compilation and use are outlined, from corpus encoding and annotation to indexing and data retrieval, and the various methods and techniques that allow end users to make sense of corpus data are described. The volume also offers detailed guidelines for the construction and analysis of multilingual corpora. Corpus creation and use are illustrated through practical examples and case studies, with each chapter outlining a set of tasks aimed at guiding researchers, students and translators to practice some of the methods and use some of the resources discussed. These tasks are meant as hands-on activities to be carried out using the materials and links available in an accompanying DVD. Suggested further readings at the end of each chapter are complemented by an extensive bibliography at the end of the volume. Translation-Driven Corpora is designed for use by teachers and students in the classroom or by researchers and professionals for self-learning. It is an invaluable resource for anyone interested in this fast growing area of scholarly and professional activity.
Publisher: Routledge
ISBN: 1317639847
Category : Language Arts & Disciplines
Languages : en
Pages : 209
Book Description
Electronic texts and text analysis tools have opened up a wealth of opportunities to higher education and language service providers, but learning to use these resources continues to pose challenges to scholars and professionals alike. Translation-Driven Corpora aims to introduce readers to corpus tools and methods which may be used in translation research and practice. Each chapter focuses on specific aspects of corpus creation and use. An introduction to corpora and overview of applications of corpus linguistics methodologies to translation studies is followed by a discussion of corpus design and acquisition. Different stages and tools involved in corpus compilation and use are outlined, from corpus encoding and annotation to indexing and data retrieval, and the various methods and techniques that allow end users to make sense of corpus data are described. The volume also offers detailed guidelines for the construction and analysis of multilingual corpora. Corpus creation and use are illustrated through practical examples and case studies, with each chapter outlining a set of tasks aimed at guiding researchers, students and translators to practice some of the methods and use some of the resources discussed. These tasks are meant as hands-on activities to be carried out using the materials and links available in an accompanying DVD. Suggested further readings at the end of each chapter are complemented by an extensive bibliography at the end of the volume. Translation-Driven Corpora is designed for use by teachers and students in the classroom or by researchers and professionals for self-learning. It is an invaluable resource for anyone interested in this fast growing area of scholarly and professional activity.
Comparable Corpora and Computer-assisted Translation
Author: Estelle Maryline Delpech
Publisher: John Wiley & Sons
ISBN: 1119002702
Category : Computers
Languages : en
Pages : 221
Book Description
Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another. This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations. The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation. The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).
Publisher: John Wiley & Sons
ISBN: 1119002702
Category : Computers
Languages : en
Pages : 221
Book Description
Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another. This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations. The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation. The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).
Computational Phraseology
Author: Gloria Corpas Pastor
Publisher: John Benjamins Publishing Company
ISBN: 9027261393
Category : Language Arts & Disciplines
Languages : en
Pages : 341
Book Description
Whether you wish to deliver on a promise, take a walk down memory lane or even on the wild side, phraseological units (also often referred to as phrasemes or multiword expressions) are present in most communicative situations and in all world’s languages. Phraseology, the study of phraseological units, has therefore become a rare unifying theme across linguistic theories. In recent years, an increasing number of studies have been concerned with the computational treatment of multiword expressions: these pertain among others to their automatic identification, extraction or translation, and to the role they play in various Natural Language Processing applications. Computational Phraseology is a comparatively new field where better understanding and more advances are urgently needed. This book aims to address this pressing need, by bringing together contributions focusing on different perspectives of this promising interdisciplinary field.
Publisher: John Benjamins Publishing Company
ISBN: 9027261393
Category : Language Arts & Disciplines
Languages : en
Pages : 341
Book Description
Whether you wish to deliver on a promise, take a walk down memory lane or even on the wild side, phraseological units (also often referred to as phrasemes or multiword expressions) are present in most communicative situations and in all world’s languages. Phraseology, the study of phraseological units, has therefore become a rare unifying theme across linguistic theories. In recent years, an increasing number of studies have been concerned with the computational treatment of multiword expressions: these pertain among others to their automatic identification, extraction or translation, and to the role they play in various Natural Language Processing applications. Computational Phraseology is a comparatively new field where better understanding and more advances are urgently needed. This book aims to address this pressing need, by bringing together contributions focusing on different perspectives of this promising interdisciplinary field.
Data Analytics and Management in Data Intensive Domains
Author: Alexei Pozanenko
Publisher: Springer Nature
ISBN: 3031122852
Category : Computers
Languages : en
Pages : 272
Book Description
This book constitutes the post-conference proceedings of the 23rd International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2021, held in Moscow, Russia, in October 2021*. The 16 revised full papers were carefully reviewed and selected from 61 submissions. The papers are organized in the following topical sections: problem solving infrastructures, experiment organization, and machine learning applications; data analysis in astronomy; data analysis in material and earth sciences; information extraction from text * The conference was held virtually due to the COVID-19 pandemic.
Publisher: Springer Nature
ISBN: 3031122852
Category : Computers
Languages : en
Pages : 272
Book Description
This book constitutes the post-conference proceedings of the 23rd International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2021, held in Moscow, Russia, in October 2021*. The 16 revised full papers were carefully reviewed and selected from 61 submissions. The papers are organized in the following topical sections: problem solving infrastructures, experiment organization, and machine learning applications; data analysis in astronomy; data analysis in material and earth sciences; information extraction from text * The conference was held virtually due to the COVID-19 pandemic.
Recent Developments in Intelligent Information and Database Systems
Author: Dariusz Król
Publisher: Springer
ISBN: 3319312774
Category : Technology & Engineering
Languages : en
Pages : 451
Book Description
The objective of this book is to contribute to the development of the intelligent information and database systems with the essentials of current knowledge, experience and know-how. The book contains a selection of 40 chapters based on original research presented as posters during the 8th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2016) held on 14–16 March 2016 in Da Nang, Vietnam. The papers to some extent reflect the achievements of scientific teams from 17 countries in five continents. The volume is divided into six parts: (a) Computational Intelligence in Data Mining and Machine Learning, (b) Ontologies, Social Networks and Recommendation Systems, (c) Web Services, Cloud Computing, Security and Intelligent Internet Systems, (d) Knowledge Management and Language Processing, (e) Image, Video, Motion Analysis and Recognition, and (f) Advanced Computing Applications and Technologies. The book is an excellent resource for researchers, those working in artificial intelligence, multimedia, networks and big data technologies, as well as for students interested in computer science and other related fields.
Publisher: Springer
ISBN: 3319312774
Category : Technology & Engineering
Languages : en
Pages : 451
Book Description
The objective of this book is to contribute to the development of the intelligent information and database systems with the essentials of current knowledge, experience and know-how. The book contains a selection of 40 chapters based on original research presented as posters during the 8th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2016) held on 14–16 March 2016 in Da Nang, Vietnam. The papers to some extent reflect the achievements of scientific teams from 17 countries in five continents. The volume is divided into six parts: (a) Computational Intelligence in Data Mining and Machine Learning, (b) Ontologies, Social Networks and Recommendation Systems, (c) Web Services, Cloud Computing, Security and Intelligent Internet Systems, (d) Knowledge Management and Language Processing, (e) Image, Video, Motion Analysis and Recognition, and (f) Advanced Computing Applications and Technologies. The book is an excellent resource for researchers, those working in artificial intelligence, multimedia, networks and big data technologies, as well as for students interested in computer science and other related fields.
Computational Linguistics
Author: Adam Przepiórkowski
Publisher: Springer
ISBN: 3642343996
Category : Technology & Engineering
Languages : en
Pages : 290
Book Description
The ever-growing popularity of Google over the recent decade has required a specific method of man-machine communication: human query should be short, whereas the machine answer may take a form of a wide range of documents. This type of communication has triggered a rapid development in the domain of Information Extraction, aimed at providing the asker with a more precise information. The recent success of intelligent personal assistants supporting users in searching or even extracting information and answers from large collections of electronic documents signals the onset of a new era in man-machine communication – we shall soon explain to our small devices what we need to know and expect valuable answers quickly and automatically delivered. The progress of man-machine communication is accompanied by growth in the significance of applied Computational Linguistics – we need machines to understand much more from the language we speak naturally than it is the case of up-to-date search systems. Moreover, we need machine support in crossing language barriers that is necessary more and more often when facing the global character of the Web. This books reports on the latest developments in the field. It contains 15 chapters written by researchers who aim at making linguistic theories work – for the better understanding between the man and the machine.
Publisher: Springer
ISBN: 3642343996
Category : Technology & Engineering
Languages : en
Pages : 290
Book Description
The ever-growing popularity of Google over the recent decade has required a specific method of man-machine communication: human query should be short, whereas the machine answer may take a form of a wide range of documents. This type of communication has triggered a rapid development in the domain of Information Extraction, aimed at providing the asker with a more precise information. The recent success of intelligent personal assistants supporting users in searching or even extracting information and answers from large collections of electronic documents signals the onset of a new era in man-machine communication – we shall soon explain to our small devices what we need to know and expect valuable answers quickly and automatically delivered. The progress of man-machine communication is accompanied by growth in the significance of applied Computational Linguistics – we need machines to understand much more from the language we speak naturally than it is the case of up-to-date search systems. Moreover, we need machine support in crossing language barriers that is necessary more and more often when facing the global character of the Web. This books reports on the latest developments in the field. It contains 15 chapters written by researchers who aim at making linguistic theories work – for the better understanding between the man and the machine.
Web As Corpus
Author: Maristella Gatto
Publisher: A&C Black
ISBN: 1441134131
Category : Language Arts & Disciplines
Languages : en
Pages : 255
Book Description
Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the “web as corpus”. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.
Publisher: A&C Black
ISBN: 1441134131
Category : Language Arts & Disciplines
Languages : en
Pages : 255
Book Description
Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the “web as corpus”. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.