Author: Cédrick Fairon
Publisher: Presses univ. de Louvain
ISBN: 9782874630828
Category : Language Arts & Disciplines
Languages : en
Pages : 186
Book Description
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.
Building and Exploring Web Corpora (WAC3 - 2007)
Author: Cédrick Fairon
Publisher: Presses univ. de Louvain
ISBN: 9782874630828
Category : Language Arts & Disciplines
Languages : en
Pages : 186
Book Description
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.
Publisher: Presses univ. de Louvain
ISBN: 9782874630828
Category : Language Arts & Disciplines
Languages : en
Pages : 186
Book Description
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.
Web As Corpus
Author: Maristella Gatto
Publisher: A&C Black
ISBN: 1472571533
Category : Language Arts & Disciplines
Languages : en
Pages : 258
Book Description
Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the “web as corpus”. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.
Publisher: A&C Black
ISBN: 1472571533
Category : Language Arts & Disciplines
Languages : en
Pages : 258
Book Description
Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the “web as corpus”. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.
Information Science and Applications
Author: Kuinam J. Kim
Publisher: Springer
ISBN: 3662465787
Category : Technology & Engineering
Languages : en
Pages : 1087
Book Description
This proceedings volume provides a snapshot of the latest issues encountered in technical convergence and convergences of security technology. It explores how information science is core to most current research, industrial and commercial activities and consists of contributions covering topics including Ubiquitous Computing, Networks and Information Systems, Multimedia and Visualization, Middleware and Operating Systems, Security and Privacy, Data Mining and Artificial Intelligence, Software Engineering, and Web Technology. The proceedings introduce the most recent information technology and ideas, applications and problems related to technology convergence, illustrated through case studies, and reviews converging existing security techniques. Through this volume, readers will gain an understanding of the current state-of-the-art in information strategies and technologies of convergence security. The intended readership are researchers in academia, industry, and other research institutes focusing on information science and technology.
Publisher: Springer
ISBN: 3662465787
Category : Technology & Engineering
Languages : en
Pages : 1087
Book Description
This proceedings volume provides a snapshot of the latest issues encountered in technical convergence and convergences of security technology. It explores how information science is core to most current research, industrial and commercial activities and consists of contributions covering topics including Ubiquitous Computing, Networks and Information Systems, Multimedia and Visualization, Middleware and Operating Systems, Security and Privacy, Data Mining and Artificial Intelligence, Software Engineering, and Web Technology. The proceedings introduce the most recent information technology and ideas, applications and problems related to technology convergence, illustrated through case studies, and reviews converging existing security techniques. Through this volume, readers will gain an understanding of the current state-of-the-art in information strategies and technologies of convergence security. The intended readership are researchers in academia, industry, and other research institutes focusing on information science and technology.
The Routledge Handbook of Vocabulary Studies
Author: Stuart Webb
Publisher: Routledge
ISBN: 1000012387
Category : Language Arts & Disciplines
Languages : en
Pages : 624
Book Description
The Routledge Handbook of Vocabulary Studies provides a cutting-edge survey of current scholarship in this area. Divided into four sections, which cover understanding vocabulary; approaches to teaching and learning vocabulary; measuring knowledge of vocabulary; and key issues in teaching, researching, and measuring vocabulary, this Handbook: • brings together a wide range of approaches to learning words to provide clarity on how best vocabulary might be taught and learned; • provides a comprehensive discussion of the key issues and challenges in vocabulary studies, with research taken from the past 40 years; • includes chapters on both formulaic language as well as single-word items; • features original contributions from a range of internationally renowned scholars as well as academics at the forefront of innovative research. The Routledge Handbook of Vocabulary Studies is an essential text for those interested in teaching, learning, and researching vocabulary.
Publisher: Routledge
ISBN: 1000012387
Category : Language Arts & Disciplines
Languages : en
Pages : 624
Book Description
The Routledge Handbook of Vocabulary Studies provides a cutting-edge survey of current scholarship in this area. Divided into four sections, which cover understanding vocabulary; approaches to teaching and learning vocabulary; measuring knowledge of vocabulary; and key issues in teaching, researching, and measuring vocabulary, this Handbook: • brings together a wide range of approaches to learning words to provide clarity on how best vocabulary might be taught and learned; • provides a comprehensive discussion of the key issues and challenges in vocabulary studies, with research taken from the past 40 years; • includes chapters on both formulaic language as well as single-word items; • features original contributions from a range of internationally renowned scholars as well as academics at the forefront of innovative research. The Routledge Handbook of Vocabulary Studies is an essential text for those interested in teaching, learning, and researching vocabulary.
Using Corpora in Contrastive and Translation Studies
Author: Richard Xiao
Publisher: Cambridge Scholars Publishing
ISBN: 1527554848
Category : Language Arts & Disciplines
Languages : en
Pages : 550
Book Description
The corpus-based approach has developed into a well established paradigm in translation studies and has been recognised as a principal reason for the revival of contrastive linguistics since the 1990s, while corpus-based contrastive and translation studies have in turn significantly expanded the scope of corpus linguistics. This book features a selection of twenty-three papers from the 2008 meeting of Using Corpora in Contrastive and Translation Studies (UCCTS), an international conference series launched to provide an international forum for the exploration of theoretical and practical issues pertaining to the creation and use of corpora in contrastive and translation studies. The papers in this collection represent the latest developments in corpus-based translation studies, corpus-based contrastive studies, parallel corpus development and bilingual lexicography. They are useful resources for researchers as well as postgraduates and their supervisors in translation studies, comparative and contrastive linguistics, corpus linguistics, and computational linguistics.
Publisher: Cambridge Scholars Publishing
ISBN: 1527554848
Category : Language Arts & Disciplines
Languages : en
Pages : 550
Book Description
The corpus-based approach has developed into a well established paradigm in translation studies and has been recognised as a principal reason for the revival of contrastive linguistics since the 1990s, while corpus-based contrastive and translation studies have in turn significantly expanded the scope of corpus linguistics. This book features a selection of twenty-three papers from the 2008 meeting of Using Corpora in Contrastive and Translation Studies (UCCTS), an international conference series launched to provide an international forum for the exploration of theoretical and practical issues pertaining to the creation and use of corpora in contrastive and translation studies. The papers in this collection represent the latest developments in corpus-based translation studies, corpus-based contrastive studies, parallel corpus development and bilingual lexicography. They are useful resources for researchers as well as postgraduates and their supervisors in translation studies, comparative and contrastive linguistics, corpus linguistics, and computational linguistics.
Web Corpus Construction
Author: Roland Schäfer
Publisher: Morgan & Claypool Publishers
ISBN: 1627053123
Category : Computers
Languages : en
Pages : 197
Book Description
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).
Publisher: Morgan & Claypool Publishers
ISBN: 1627053123
Category : Computers
Languages : en
Pages : 197
Book Description
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).
Forms of Migration, Migrations of Forms: Language studies
Author: Associazione italiana di anglistica. Congresso
Publisher:
ISBN:
Category : Language Arts & Disciplines
Languages : en
Pages : 574
Book Description
Publisher:
ISBN:
Category : Language Arts & Disciplines
Languages : en
Pages : 574
Book Description
The Irish Language in the Digital Age
Author: Georg Rehm
Publisher: Springer Science & Business Media
ISBN: 364230558X
Category : Computers
Languages : en
Pages : 90
Book Description
This white paper is part of a series that promotes knowledge about language technology and its potential. It addresses educators, journalists, politicians, language communities and others. The availability and use of language technology in Europe varies between languages. Consequently, the actions that are required to further support research and development of language technologies also differ for each language. The required actions depend on many factors, such as the complexity of a given language and the size of its community. META-NET, a Network of Excellence funded by the European Commission, has conducted an analysis of current language resources and technologies. This analysis focused on the 23 official European languages as well as other important national and regional languages in Europe. The results of this analysis suggest that there are many significant research gaps for each language. A more detailed expert analysis and assessment of the current situation will help maximise the impact of additional research and minimize any risks. META-NET consists of 54 research centres from 33 countries that are working with stakeholders from commercial businesses, government agencies, industry, research organisations, software companies, technology providers and European universities. Together, they are creating a common technology vision while developing a strategic research agenda that shows how language technology applications can address any research gaps by 2020.
Publisher: Springer Science & Business Media
ISBN: 364230558X
Category : Computers
Languages : en
Pages : 90
Book Description
This white paper is part of a series that promotes knowledge about language technology and its potential. It addresses educators, journalists, politicians, language communities and others. The availability and use of language technology in Europe varies between languages. Consequently, the actions that are required to further support research and development of language technologies also differ for each language. The required actions depend on many factors, such as the complexity of a given language and the size of its community. META-NET, a Network of Excellence funded by the European Commission, has conducted an analysis of current language resources and technologies. This analysis focused on the 23 official European languages as well as other important national and regional languages in Europe. The results of this analysis suggest that there are many significant research gaps for each language. A more detailed expert analysis and assessment of the current situation will help maximise the impact of additional research and minimize any risks. META-NET consists of 54 research centres from 33 countries that are working with stakeholders from commercial businesses, government agencies, industry, research organisations, software companies, technology providers and European universities. Together, they are creating a common technology vision while developing a strategic research agenda that shows how language technology applications can address any research gaps by 2020.
Language Processing and Knowledge in the Web
Author: Iryna Gurevych
Publisher: Springer
ISBN: 3642407226
Category : Computers
Languages : en
Pages : 227
Book Description
This book constitutes the refereed conference proceedings of the 25th International Conference on Language Processing and Knowledge in the Web, GSCL 2013, held in Darmstadt, Germany, in September 2013. The 20 revised full papers were carefully selected from numerous submissions and cover topics on language processing and knowledge in the Web on several important dimensions, such as computational linguistics, language technology, and processing of unstructured textual content in the Web.
Publisher: Springer
ISBN: 3642407226
Category : Computers
Languages : en
Pages : 227
Book Description
This book constitutes the refereed conference proceedings of the 25th International Conference on Language Processing and Knowledge in the Web, GSCL 2013, held in Darmstadt, Germany, in September 2013. The 20 revised full papers were carefully selected from numerous submissions and cover topics on language processing and knowledge in the Web on several important dimensions, such as computational linguistics, language technology, and processing of unstructured textual content in the Web.
The Oxford Handbook of Lexicography
Author: Philip Durkin
Publisher: Oxford University Press
ISBN: 0199691630
Category : Language Arts & Disciplines
Languages : en
Pages : 737
Book Description
This volume provides concise, authoritative accounts of the approaches and methodologies of modern lexicography and of the aims and qualities of its end products. Leading scholars and professional lexicographers, from all over the world and representing all the main traditions andperspectives, assess the state of the art in every aspect of research and practice. The book is divided into four parts, reflecting the main types of lexicography. Part I looks at synchronic dictionaries - those for the general public, monolingual dictionaries for second-language learners, andbilingual dictionaries. Part II and III are devoted to the distinctive methodologies and concerns of the historical dictionaries and specialist dictionaries respectively, while chapters in Part IV examine specific topics such as description and prescription; the representation of pronunciation; andthe practicalities of dictionary production. The book ends with a chronology of the major events in the history of lexicography. It will be a valuable resource for students, scholars, and practitioners in the field.
Publisher: Oxford University Press
ISBN: 0199691630
Category : Language Arts & Disciplines
Languages : en
Pages : 737
Book Description
This volume provides concise, authoritative accounts of the approaches and methodologies of modern lexicography and of the aims and qualities of its end products. Leading scholars and professional lexicographers, from all over the world and representing all the main traditions andperspectives, assess the state of the art in every aspect of research and practice. The book is divided into four parts, reflecting the main types of lexicography. Part I looks at synchronic dictionaries - those for the general public, monolingual dictionaries for second-language learners, andbilingual dictionaries. Part II and III are devoted to the distinctive methodologies and concerns of the historical dictionaries and specialist dictionaries respectively, while chapters in Part IV examine specific topics such as description and prescription; the representation of pronunciation; andthe practicalities of dictionary production. The book ends with a chronology of the major events in the history of lexicography. It will be a valuable resource for students, scholars, and practitioners in the field.