Author: Armstrong-Warwick Armstrong
Publisher: MIT Press
ISBN: 9780262510820
Category : Business & Economics
Languages : en
Pages : 364
Book Description
Using Large Corpora identifies new data-oriented methods for organizing and analyzing large corpora and describes the potential results that the use of large corpora offers. Today, large corpora consisting of hundreds of millions or even billions of words, along with new empirical and statistical methods for organizing and analyzing these data, promise new insights into the use of language. Already, the data extracted from these large corpora reveal that language use is more flexible and complex than most rule-based systems have tried to account for, providing a basis for progress in the performance of Natural Language Processing systems. Using Large Corpora identifies these new data-oriented methods and describes the potential results that the use of large corpora offers. The research described shows that the new methods may offer solutions to key issues of acquisition (automatically identifying and coding information), coverage (accounting for all of the phenomena in a given domain), robustness (accommodating real data that may be corrupt or not accounted for in the model), and extensibility (applying the model and data to a new domain, text, or problem). There are chapters on lexical issues, issues in syntax, and translation topics, as well discussions of the statistics-based vs. rule-based debate. ACL-MIT Series in Natural Language Processing.
Using Large Corpora
Author: Armstrong-Warwick Armstrong
Publisher: MIT Press
ISBN: 9780262510820
Category : Business & Economics
Languages : en
Pages : 364
Book Description
Using Large Corpora identifies new data-oriented methods for organizing and analyzing large corpora and describes the potential results that the use of large corpora offers. Today, large corpora consisting of hundreds of millions or even billions of words, along with new empirical and statistical methods for organizing and analyzing these data, promise new insights into the use of language. Already, the data extracted from these large corpora reveal that language use is more flexible and complex than most rule-based systems have tried to account for, providing a basis for progress in the performance of Natural Language Processing systems. Using Large Corpora identifies these new data-oriented methods and describes the potential results that the use of large corpora offers. The research described shows that the new methods may offer solutions to key issues of acquisition (automatically identifying and coding information), coverage (accounting for all of the phenomena in a given domain), robustness (accommodating real data that may be corrupt or not accounted for in the model), and extensibility (applying the model and data to a new domain, text, or problem). There are chapters on lexical issues, issues in syntax, and translation topics, as well discussions of the statistics-based vs. rule-based debate. ACL-MIT Series in Natural Language Processing.
Publisher: MIT Press
ISBN: 9780262510820
Category : Business & Economics
Languages : en
Pages : 364
Book Description
Using Large Corpora identifies new data-oriented methods for organizing and analyzing large corpora and describes the potential results that the use of large corpora offers. Today, large corpora consisting of hundreds of millions or even billions of words, along with new empirical and statistical methods for organizing and analyzing these data, promise new insights into the use of language. Already, the data extracted from these large corpora reveal that language use is more flexible and complex than most rule-based systems have tried to account for, providing a basis for progress in the performance of Natural Language Processing systems. Using Large Corpora identifies these new data-oriented methods and describes the potential results that the use of large corpora offers. The research described shows that the new methods may offer solutions to key issues of acquisition (automatically identifying and coding information), coverage (accounting for all of the phenomena in a given domain), robustness (accommodating real data that may be corrupt or not accounted for in the model), and extensibility (applying the model and data to a new domain, text, or problem). There are chapters on lexical issues, issues in syntax, and translation topics, as well discussions of the statistics-based vs. rule-based debate. ACL-MIT Series in Natural Language Processing.
The Handbook of Historical Linguistics, Volume II
Author: Richard D. Janda
Publisher: John Wiley & Sons
ISBN: 111873226X
Category : Language Arts & Disciplines
Languages : en
Pages : 705
Book Description
An entirely new follow-up volume providing a detailed account of numerous additional issues, methods, and results that characterize current work in historical linguistics. This brand-new, second volume of The Handbook of Historical Linguistics is a complement to the well-established first volume first published in 2003. It includes extended content allowing uniquely comprehensive coverage of the study of language(s) over time. Though it adds fresh perspectives on several topics previously treated in the first volume, this Handbook focuses on extensions of diachronic linguistics beyond those key issues. This Handbook provides readers with studies of language change whose perspectives range from comparisons of large open vs. small closed corpora, via creolistics and linguistic contact in general, to obsolescence and endangerment of languages. Written by leading scholars in their respective fields, new chapters are offered on matters such as the origin of language, evidence from language for reconstructing human prehistory, invocations of language present in studies of language past, benefits of linguistic fieldwork for historical investigation, ways in which not only biological evolution but also field biology can serve as heuristics for research into the rise and spread of linguistic innovations, and more. Moreover, it: offers novel and broadened content complementing the earlier volume so as to provide the fullest available overview of a wholly engrossing field includes 23 all-new contributed chapters, treating some familiar themes from fresh perspectives but mostly covering entirely new topics features expanded discussion of material from language families other than Indo-European provides a multiplicity of views from numerous specialists in linguistic diachrony. The Handbook of Historical Linguistics, Volume II is an ideal book for undergraduate and graduate students in linguistics, researchers and professional linguists, as well as all those interested in the history of particular languages and the history of language more generally.
Publisher: John Wiley & Sons
ISBN: 111873226X
Category : Language Arts & Disciplines
Languages : en
Pages : 705
Book Description
An entirely new follow-up volume providing a detailed account of numerous additional issues, methods, and results that characterize current work in historical linguistics. This brand-new, second volume of The Handbook of Historical Linguistics is a complement to the well-established first volume first published in 2003. It includes extended content allowing uniquely comprehensive coverage of the study of language(s) over time. Though it adds fresh perspectives on several topics previously treated in the first volume, this Handbook focuses on extensions of diachronic linguistics beyond those key issues. This Handbook provides readers with studies of language change whose perspectives range from comparisons of large open vs. small closed corpora, via creolistics and linguistic contact in general, to obsolescence and endangerment of languages. Written by leading scholars in their respective fields, new chapters are offered on matters such as the origin of language, evidence from language for reconstructing human prehistory, invocations of language present in studies of language past, benefits of linguistic fieldwork for historical investigation, ways in which not only biological evolution but also field biology can serve as heuristics for research into the rise and spread of linguistic innovations, and more. Moreover, it: offers novel and broadened content complementing the earlier volume so as to provide the fullest available overview of a wholly engrossing field includes 23 all-new contributed chapters, treating some familiar themes from fresh perspectives but mostly covering entirely new topics features expanded discussion of material from language families other than Indo-European provides a multiplicity of views from numerous specialists in linguistic diachrony. The Handbook of Historical Linguistics, Volume II is an ideal book for undergraduate and graduate students in linguistics, researchers and professional linguists, as well as all those interested in the history of particular languages and the history of language more generally.
Using Corpora in Discourse Analysis
Author: Paul Baker
Publisher: Bloomsbury Publishing
ISBN: 1350083771
Category : Language Arts & Disciplines
Languages : en
Pages : 281
Book Description
How can you carry out discourse analysis using corpus linguistics? What research questions should I ask? Which methods should you use and when? What is a collocational network or a key cluster? Introducing the major techniques, methods and tools for corpus-assisted analysis of discourse, this book answers these questions and more, showing readers how to best use corpora in their analyses of discourse. Using carefully tailored case studies, each chapter is devoted to a central technique, including frequency, concordancing and keywords, going step by step through the process of applying different analytical procedures. Introducing a wide range of different corpora, from holiday brochures to political debates, the book considers the key debates and latest advances in the field. Fully revised and updated, this new edition includes: - A new chapter on how to conduct research projects in corpus-based discourse analysis - Completely rewritten chapters on collocation and advanced techniques, using a corpus of jihadist propaganda texts and covering topics such as social media and visual analysis - Coverage of major tools, including CQPweb, AntConc, Sketch Engine and #LancsBox - Discussion of newer techniques including the derivation of lockwords and the comparison of multiple data sets for diachronic analysis With exercises, discussion questions and suggested further readings in each chapter, this book is an excellent guide to using corpus linguistics techniques to carry out discourse analysis.
Publisher: Bloomsbury Publishing
ISBN: 1350083771
Category : Language Arts & Disciplines
Languages : en
Pages : 281
Book Description
How can you carry out discourse analysis using corpus linguistics? What research questions should I ask? Which methods should you use and when? What is a collocational network or a key cluster? Introducing the major techniques, methods and tools for corpus-assisted analysis of discourse, this book answers these questions and more, showing readers how to best use corpora in their analyses of discourse. Using carefully tailored case studies, each chapter is devoted to a central technique, including frequency, concordancing and keywords, going step by step through the process of applying different analytical procedures. Introducing a wide range of different corpora, from holiday brochures to political debates, the book considers the key debates and latest advances in the field. Fully revised and updated, this new edition includes: - A new chapter on how to conduct research projects in corpus-based discourse analysis - Completely rewritten chapters on collocation and advanced techniques, using a corpus of jihadist propaganda texts and covering topics such as social media and visual analysis - Coverage of major tools, including CQPweb, AntConc, Sketch Engine and #LancsBox - Discussion of newer techniques including the derivation of lockwords and the comparison of multiple data sets for diachronic analysis With exercises, discussion questions and suggested further readings in each chapter, this book is an excellent guide to using corpus linguistics techniques to carry out discourse analysis.
Advances in Empirical Translation Studies
Author: Meng Ji
Publisher: Cambridge University Press
ISBN: 1108423272
Category : Computers
Languages : en
Pages : 285
Book Description
Introduces the integration of theoretical and applied translation studies for socially-oriented and data-driven empirical translation research.
Publisher: Cambridge University Press
ISBN: 1108423272
Category : Computers
Languages : en
Pages : 285
Book Description
Introduces the integration of theoretical and applied translation studies for socially-oriented and data-driven empirical translation research.
Web Corpus Construction
Author: Roland Schäfer
Publisher: Morgan & Claypool Publishers
ISBN: 1627053123
Category : Computers
Languages : en
Pages : 197
Book Description
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).
Publisher: Morgan & Claypool Publishers
ISBN: 1627053123
Category : Computers
Languages : en
Pages : 197
Book Description
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).
Developing Linguistic Corpora
Author: Martin Wynne
Publisher: Oxbow Books Limited
ISBN:
Category : Language Arts & Disciplines
Languages : en
Pages : 100
Book Description
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.
Publisher: Oxbow Books Limited
ISBN:
Category : Language Arts & Disciplines
Languages : en
Pages : 100
Book Description
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.
Exploring Linguistic Science
Author: Allison Burkette
Publisher:
ISBN: 1108424805
Category : Language Arts & Disciplines
Languages : en
Pages : 253
Book Description
Introduces students to the scientific study of language, using the basic principles of complexity theory.
Publisher:
ISBN: 1108424805
Category : Language Arts & Disciplines
Languages : en
Pages : 253
Book Description
Introduces students to the scientific study of language, using the basic principles of complexity theory.
Explanation and Interaction
Author: Alison Cawsey
Publisher: Bradford Books
ISBN: 9780262517058
Category : Computers
Languages : en
Pages : 240
Book Description
Describes the problems and issues involved in generating interactive user-sensitiveexplanations.
Publisher: Bradford Books
ISBN: 9780262517058
Category : Computers
Languages : en
Pages : 240
Book Description
Describes the problems and issues involved in generating interactive user-sensitiveexplanations.
Corpus Linguistics
Author: Douglas Biber
Publisher: Cambridge University Press
ISBN: 9780521499576
Category : Computers
Languages : en
Pages : 324
Book Description
An investigation into the way people use language in speech and writing, this volume introduces the corpus-based approach, which is based on analysis of large databases of real language examples stored on computer.
Publisher: Cambridge University Press
ISBN: 9780521499576
Category : Computers
Languages : en
Pages : 324
Book Description
An investigation into the way people use language in speech and writing, this volume introduces the corpus-based approach, which is based on analysis of large databases of real language examples stored on computer.
How to Use Corpora in Language Teaching
Author: John McHardy Sinclair
Publisher: John Benjamins Publishing
ISBN: 9027222835
Category : Language Arts & Disciplines
Languages : en
Pages : 316
Book Description
After decades of being overlooked, corpus evidence is becoming an important component of the teaching and learning of languages. Above all, the profession needs guidance in the practicalities of using corpora, interpreting the results and applying them to the problems and opportunities of the classroom. This book is intensely practical, written mainly by a new generation of language teachers who are acknowledged experts in central aspects of the discipline. It offers advice on what to do in the classroom, how to cope with teachers' queries about language, what corpora to use including learner corpora and spoken corpora and how to handle the variability of language; it reports on some current research and explains how the access software is constructed, including an opportunity for the practitioner to write small but useful programs; and it takes a look into the future of corpora in language teaching.
Publisher: John Benjamins Publishing
ISBN: 9027222835
Category : Language Arts & Disciplines
Languages : en
Pages : 316
Book Description
After decades of being overlooked, corpus evidence is becoming an important component of the teaching and learning of languages. Above all, the profession needs guidance in the practicalities of using corpora, interpreting the results and applying them to the problems and opportunities of the classroom. This book is intensely practical, written mainly by a new generation of language teachers who are acknowledged experts in central aspects of the discipline. It offers advice on what to do in the classroom, how to cope with teachers' queries about language, what corpora to use including learner corpora and spoken corpora and how to handle the variability of language; it reports on some current research and explains how the access software is constructed, including an opportunity for the practitioner to write small but useful programs; and it takes a look into the future of corpora in language teaching.