Cluster Analysis for Corpus Linguistics

Cluster Analysis for Corpus Linguistics PDF Author: Hermann Moisl
Publisher: Walter de Gruyter GmbH & Co KG
ISBN: 3110393174
Category : Language Arts & Disciplines
Languages : en
Pages : 319

Get Book Here

Book Description
The standard scientific methodology in linguistics is empirical testing of falsifiable hypotheses. As such the process of hypothesis generation is central, and involves formulation of a research question about a domain of interest and statement of a hypothesis relative to it. In corpus linguistics the domain is text, and generation involves abstraction of data from text, data analysis, and formulation of a hypothesis based on inference from the results. Traditionally this process has been paper-based, but the advent of electronic text has increasingly rendered it obsolete both because the size of digital corpora is now at or beyond the limit of what can efficiently be used in the traditional way, and because the complexity of data abstracted from them can be impenetrable to understanding. Linguists are increasingly turning to mathematical and statistical computational methods for help, and cluster analysis is such a method. It is used across the sciences for hypothesis generation by identification of structure in data which are too large or complex, or both, to be interpretable by direct inspection. This book aims to show how cluster analysis can be used for hypothesis generation in corpus linguistics, thereby contributing to a quantitative empirical methodology for the discipline.

Cluster Analysis for Corpus Linguistics

Cluster Analysis for Corpus Linguistics PDF Author: Hermann Moisl
Publisher: Walter de Gruyter GmbH & Co KG
ISBN: 3110393174
Category : Language Arts & Disciplines
Languages : en
Pages : 319

Get Book Here

Book Description
The standard scientific methodology in linguistics is empirical testing of falsifiable hypotheses. As such the process of hypothesis generation is central, and involves formulation of a research question about a domain of interest and statement of a hypothesis relative to it. In corpus linguistics the domain is text, and generation involves abstraction of data from text, data analysis, and formulation of a hypothesis based on inference from the results. Traditionally this process has been paper-based, but the advent of electronic text has increasingly rendered it obsolete both because the size of digital corpora is now at or beyond the limit of what can efficiently be used in the traditional way, and because the complexity of data abstracted from them can be impenetrable to understanding. Linguists are increasingly turning to mathematical and statistical computational methods for help, and cluster analysis is such a method. It is used across the sciences for hypothesis generation by identification of structure in data which are too large or complex, or both, to be interpretable by direct inspection. This book aims to show how cluster analysis can be used for hypothesis generation in corpus linguistics, thereby contributing to a quantitative empirical methodology for the discipline.

Cluster Analysis for Corpus Linguistics

Cluster Analysis for Corpus Linguistics PDF Author: Hermann Moisl
Publisher: Walter de Gruyter GmbH & Co KG
ISBN: 311036381X
Category : Language Arts & Disciplines
Languages : en
Pages : 398

Get Book Here

Book Description
The standard scientific methodology in linguistics is empirical testing of falsifiable hypotheses. As such the process of hypothesis generation is central, and involves formulation of a research question about a domain of interest and statement of a hypothesis relative to it. In corpus linguistics the domain is text, and generation involves abstraction of data from text, data analysis, and formulation of a hypothesis based on inference from the results. Traditionally this process has been paper-based, but the advent of electronic text has increasingly rendered it obsolete both because the size of digital corpora is now at or beyond the limit of what can efficiently be used in the traditional way, and because the complexity of data abstracted from them can be impenetrable to understanding. Linguists are increasingly turning to mathematical and statistical computational methods for help, and cluster analysis is such a method. It is used across the sciences for hypothesis generation by identification of structure in data which are too large or complex, or both, to be interpretable by direct inspection. This book aims to show how cluster analysis can be used for hypothesis generation in corpus linguistics, thereby contributing to a quantitative empirical methodology for the discipline.

Corpus Linguistics and Statistics with R

Corpus Linguistics and Statistics with R PDF Author: Guillaume Desagulier
Publisher: Springer
ISBN: 3319645722
Category : Computers
Languages : en
Pages : 359

Get Book Here

Book Description
This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.

Statistics in Corpus Linguistics

Statistics in Corpus Linguistics PDF Author: Vaclav Brezina
Publisher: Cambridge University Press
ISBN: 1107125707
Category : Foreign Language Study
Languages : en
Pages : 317

Get Book Here

Book Description
A comprehensive and accessible introduction to statistics in corpus linguistics, covering multiple techniques of quantitative language analysis and data visualisation.

Corpus Linguistics and the Web

Corpus Linguistics and the Web PDF Author: Marianne Hundt
Publisher: Rodopi
ISBN: 9042021284
Category : Computers
Languages : en
Pages : 313

Get Book Here

Book Description
Using the Web as Corpus is one of the recent challenges for corpus linguistics. This volume presents a current state-of-the-arts discussion of the topic. The articles address practical problems such as suitable linguistic search tools for accessing the www, the question of register variation, or they probe into methods for culling data from the web. The book also offers a wide range of case studies, covering morphology, syntax, lexis, as well as synchronic and diachronic variation in English. These case studies make use of the two approaches to the www in corpus linguistics - web-as-corpus and web-for-corpus-building. The case studies demonstrate that web data can provide useful additional evidence for a broad range of research questions.

Cluster Analysis for Corpus Linguistics

Cluster Analysis for Corpus Linguistics PDF Author: Hermann Moisl
Publisher: Walter de Gruyter
ISBN: 9783110363821
Category :
Languages : en
Pages : 381

Get Book Here

Book Description
The rapidly growing volume of digital natural language text and the complexity of data abstracted from it have increasingly rendered traditional corpus linguistic analytical methodology obsolete. This book describes a cluster analytic methodology for generating linguistic hypotheses on the basis of data abstracted from language corpora.

The Routledge Handbook of Corpus Linguistics

The Routledge Handbook of Corpus Linguistics PDF Author: Anne O'Keeffe
Publisher: Routledge
ISBN: 0429632649
Category : Language Arts & Disciplines
Languages : en
Pages : 684

Get Book Here

Book Description
The Routledge Handbook of Corpus Linguistics 2e provides an updated overview of a dynamic and rapidly growing area with a widely applied methodology. Over a decade on from the first edition of the Handbook, this collection of 47 chapters from experts in key areas offers a comprehensive introduction to both the development and use of corpora as well as their ever-evolving applications to other areas, such as digital humanities, sociolinguistics, stylistics, translation studies, materials design, language teaching and teacher development, media discourse, discourse analysis, forensic linguistics, second language acquisition and testing. The new edition updates all core chapters and includes new chapters on corpus linguistics and statistics, digital humanities, translation, phonetics and phonology, second language acquisition, social media and theoretical perspectives. Chapters provide annotated further reading lists and step-by-step guides as well as detailed overviews across a wide range of themes. The Handbook also includes a wealth of case studies that draw on some of the many new corpora and corpus tools that have emerged in the last decade. Organised across four themes, moving from the basic start-up topics such as corpus building and design to analysis, application and reflection, this second edition remains a crucial point of reference for advanced undergraduates, postgraduates and scholars in applied linguistics.

The Cambridge Handbook of English Corpus Linguistics

The Cambridge Handbook of English Corpus Linguistics PDF Author: Douglas Biber
Publisher: Cambridge University Press
ISBN: 1316298701
Category : Language Arts & Disciplines
Languages : en
Pages : 757

Get Book Here

Book Description
The Cambridge Handbook of English Corpus Linguistics (CHECL) surveys the breadth of corpus-based linguistic research on English, including chapters on collocations, phraseology, grammatical variation, historical change, and the description of registers and dialects. The most innovative aspects of the CHECL are its emphasis on critical discussion, its explicit evaluation of the state of the art in each sub-discipline, and the inclusion of empirical case studies. While each chapter includes a broad survey of previous research, the primary focus is on a detailed description of the most important corpus-based studies in this area, with discussion of what those studies found, and why they are important. Each chapter also includes a critical discussion of the corpus-based methods employed for research in this area, as well as an explicit summary of new findings and discoveries.

Ancient Texts and Modern Readers

Ancient Texts and Modern Readers PDF Author:
Publisher: BRILL
ISBN: 9004402918
Category : Language Arts & Disciplines
Languages : en
Pages : 393

Get Book Here

Book Description
The chapters of this volume address a variety of topics that pertain to modern readers’ understanding of ancient texts, as well as tools or resources that can facilitate contemporary audiences’ interpretation of these ancient writings and their language. In this regard, they cover subjects related to the fields of ancient Hebrew linguistics and Bible translation. The chapters apply linguistic insights and theories to elucidate elements of ancient texts for modern readers, investigate how ancient texts help modern readers to interpret features in other ancient texts, and suggest ways in which translations can make the language and conceptual worlds of ancient texts more accessible to modern readers. In so doing, they present the results of original research, identify new lines and topics of inquiry, and make novel contributions to modern readers’ understanding of ancient texts. Contributors are Alexander Andrason, Barry L. Bandstra, Reinier de Blois, Lénart J. de Regt, Gideon R. Kotzé, Geoffrey Khan, Christian S. Locatell, Kristopher Lyle, John A. Messarra, Cynthia L. Miller-Naudé, Jacobus A. Naudé, Daniel Rodriguez, Eep Talstra, Jeremy Thompson, Cornelius M. van den Heever, Herrie F. van Rooy, Gerrit J. van Steenbergen, Ernst Wendland, Tamar Zewi.

Statistics for Corpus Linguistics

Statistics for Corpus Linguistics PDF Author: Michael Oakes
Publisher: Edinburgh University Press
ISBN: 1474471382
Category : Language Arts & Disciplines
Languages : en
Pages : 304

Get Book Here

Book Description
This book in the Edinburgh Textbooks in Empirical Linguistics series is a comprehensive introduction to the statistics currently used in corpus linguistics. Statistical techniques and corpus applications - whether oriented towards linguistics or language engineering - often go hand in glove, and corpus linguists have used an increasingly wide variety of statistics, drawing on techniques developed in a great many fields. This is the first one-volume introduction to the subject.