Author: W. Bruce Croft
Publisher: Springer Science & Business Media
ISBN: 9401701717
Category : Computers
Languages : en
Pages : 253
Book Description
A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.
Language Modeling for Information Retrieval
Author: W. Bruce Croft
Publisher: Springer Science & Business Media
ISBN: 9401701717
Category : Computers
Languages : en
Pages : 253
Book Description
A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.
Publisher: Springer Science & Business Media
ISBN: 9401701717
Category : Computers
Languages : en
Pages : 253
Book Description
A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.
Introduction to Information Retrieval
Author: Christopher D. Manning
Publisher: Cambridge University Press
ISBN: 1139472100
Category : Computers
Languages : en
Pages :
Book Description
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
Publisher: Cambridge University Press
ISBN: 1139472100
Category : Computers
Languages : en
Pages :
Book Description
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
Natural Language Processing and Information Retrieval
Author: Tanveer Siddiqui
Publisher: Oxford University Press, USA
ISBN:
Category : Computers
Languages : en
Pages : 426
Book Description
Natural Language Processing and Information Retrieval is a textbook designed to meet the requirements of engineering students pursuing undergraduate and postgraduate programs in computer science and information technology. The book attempts to bridge the gap between theory and practice and would also serve as a useful reference for professionals and researchers working on language-related projects.
Publisher: Oxford University Press, USA
ISBN:
Category : Computers
Languages : en
Pages : 426
Book Description
Natural Language Processing and Information Retrieval is a textbook designed to meet the requirements of engineering students pursuing undergraduate and postgraduate programs in computer science and information technology. The book attempts to bridge the gap between theory and practice and would also serve as a useful reference for professionals and researchers working on language-related projects.
Information Retrieval
Author: Stefan Buttcher
Publisher: MIT Press
ISBN: 0262528878
Category : Computers
Languages : en
Pages : 633
Book Description
An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation. Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus—a multiuser open-source information retrieval system developed by one of the authors and available online—provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.
Publisher: MIT Press
ISBN: 0262528878
Category : Computers
Languages : en
Pages : 633
Book Description
An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation. Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus—a multiuser open-source information retrieval system developed by one of the authors and available online—provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.
An Introduction to Neural Information Retrieval
Author: Bhaskar Mitra
Publisher: Foundations and Trends (R) in Information Retrieval
ISBN: 9781680835328
Category :
Languages : en
Pages : 142
Book Description
Efficient Query Processing for Scalable Web Search will be a valuable reference for researchers and developers working on This tutorial provides an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.
Publisher: Foundations and Trends (R) in Information Retrieval
ISBN: 9781680835328
Category :
Languages : en
Pages : 142
Book Description
Efficient Query Processing for Scalable Web Search will be a valuable reference for researchers and developers working on This tutorial provides an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.
Dynamic Information Retrieval Modeling
Author: Grace Hui Yang
Publisher: Morgan & Claypool Publishers
ISBN: 1627055266
Category : Computers
Languages : en
Pages : 146
Book Description
Big data and human-computer information retrieval (HCIR) are changing IR. They capture the dynamic changes in the data and dynamic interactions of users with IR systems. A dynamic system is one which changes or adapts over time or a sequence of events. Many modern IR systems and data exhibit these characteristics which are largely ignored by conventional techniques. What is missing is an ability for the model to change over time and be responsive to stimulus. Documents, relevance, users and tasks all exhibit dynamic behavior that is captured in data sets typically collected over long time spans and models need to respond to these changes. Additionally, the size of modern datasets enforces limits on the amount of learning a system can achieve. Further to this, advances in IR interface, personalization and ad display demand models that can react to users in real time and in an intelligent, contextual way. In this book we provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. We define dynamics, what it means within the context of IR and highlight examples of problems where dynamics play an important role. We cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics. The theoretical component is based around the Markov Decision Process (MDP), a mathematical framework taken from the field of Artificial Intelligence (AI) that enables us to construct models that change according to sequential inputs. We define the framework and the algorithms commonly used to optimize over it and generalize it to the case where the inputs aren't reliable. We explore the topic of reinforcement learning more broadly and introduce another tool known as a Multi-Armed Bandit which is useful for cases where exploring model parameters is beneficial. Following this we introduce theories and algorithms which can be used to incorporate dynamics into an IR model before presenting an array of state-of-the-art research that already does, such as in the areas of session search and online advertising. Change is at the heart of modern Information Retrieval systems and this book will help equip the reader with the tools and knowledge needed to understand Dynamic Information Retrieval Modeling.
Publisher: Morgan & Claypool Publishers
ISBN: 1627055266
Category : Computers
Languages : en
Pages : 146
Book Description
Big data and human-computer information retrieval (HCIR) are changing IR. They capture the dynamic changes in the data and dynamic interactions of users with IR systems. A dynamic system is one which changes or adapts over time or a sequence of events. Many modern IR systems and data exhibit these characteristics which are largely ignored by conventional techniques. What is missing is an ability for the model to change over time and be responsive to stimulus. Documents, relevance, users and tasks all exhibit dynamic behavior that is captured in data sets typically collected over long time spans and models need to respond to these changes. Additionally, the size of modern datasets enforces limits on the amount of learning a system can achieve. Further to this, advances in IR interface, personalization and ad display demand models that can react to users in real time and in an intelligent, contextual way. In this book we provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. We define dynamics, what it means within the context of IR and highlight examples of problems where dynamics play an important role. We cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics. The theoretical component is based around the Markov Decision Process (MDP), a mathematical framework taken from the field of Artificial Intelligence (AI) that enables us to construct models that change according to sequential inputs. We define the framework and the algorithms commonly used to optimize over it and generalize it to the case where the inputs aren't reliable. We explore the topic of reinforcement learning more broadly and introduce another tool known as a Multi-Armed Bandit which is useful for cases where exploring model parameters is beneficial. Following this we introduce theories and algorithms which can be used to incorporate dynamics into an IR model before presenting an array of state-of-the-art research that already does, such as in the areas of session search and online advertising. Change is at the heart of modern Information Retrieval systems and this book will help equip the reader with the tools and knowledge needed to understand Dynamic Information Retrieval Modeling.
Multilingual Information Retrieval
Author: Carol Peters
Publisher: Springer Science & Business Media
ISBN: 3642230083
Category : Computers
Languages : en
Pages : 232
Book Description
We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings. Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations. The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many ‘hands-on details’ that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.
Publisher: Springer Science & Business Media
ISBN: 3642230083
Category : Computers
Languages : en
Pages : 232
Book Description
We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings. Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations. The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many ‘hands-on details’ that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.
Graph-based Natural Language Processing and Information Retrieval
Author: Rada Mihalcea
Publisher: Cambridge University Press
ISBN: 1139498827
Category : Computers
Languages : en
Pages : 201
Book Description
Graph theory and the fields of natural language processing and information retrieval are well-studied disciplines. Traditionally, these areas have been perceived as distinct, with different algorithms, different applications and different potential end-users. However, recent research has shown that these disciplines are intimately connected, with a large variety of natural language processing and information retrieval applications finding efficient solutions within graph-theoretical frameworks. This book extensively covers the use of graph-based algorithms for natural language processing and information retrieval. It brings together topics as diverse as lexical semantics, text summarization, text mining, ontology construction, text classification and information retrieval, which are connected by the common underlying theme of the use of graph-theoretical methods for text and information processing tasks. Readers will come away with a firm understanding of the major methods and applications in natural language processing and information retrieval that rely on graph-based representations and algorithms.
Publisher: Cambridge University Press
ISBN: 1139498827
Category : Computers
Languages : en
Pages : 201
Book Description
Graph theory and the fields of natural language processing and information retrieval are well-studied disciplines. Traditionally, these areas have been perceived as distinct, with different algorithms, different applications and different potential end-users. However, recent research has shown that these disciplines are intimately connected, with a large variety of natural language processing and information retrieval applications finding efficient solutions within graph-theoretical frameworks. This book extensively covers the use of graph-based algorithms for natural language processing and information retrieval. It brings together topics as diverse as lexical semantics, text summarization, text mining, ontology construction, text classification and information retrieval, which are connected by the common underlying theme of the use of graph-theoretical methods for text and information processing tasks. Readers will come away with a firm understanding of the major methods and applications in natural language processing and information retrieval that rely on graph-based representations and algorithms.
Machine Learning and Statistical Modeling Approaches to Image Retrieval
Author: Yixin Chen
Publisher: Springer Science & Business Media
ISBN: 1402080344
Category : Technology & Engineering
Languages : en
Pages : 194
Book Description
In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World Wide Web is becoming an enormous and expanding distributed digital library. Along with the development of the Web, image indexing and retrieval have grown into research areas sharing a vision of intelligent agents. Far beyond Web searching, image indexing and retrieval can potentially be applied to many other areas, including biomedicine, space science, biometric identification, digital libraries, the military, education, commerce, culture and entertainment. Machine Learning and Statistical Modeling Approaches to Image Retrieval describes several approaches of integrating machine learning and statistical modeling into an image retrieval and indexing system that demonstrates promising results. The topics of this book reflect authors' experiences of machine learning and statistical modeling based image indexing and retrieval. This book contains detailed references for further reading and research in this field as well.
Publisher: Springer Science & Business Media
ISBN: 1402080344
Category : Technology & Engineering
Languages : en
Pages : 194
Book Description
In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World Wide Web is becoming an enormous and expanding distributed digital library. Along with the development of the Web, image indexing and retrieval have grown into research areas sharing a vision of intelligent agents. Far beyond Web searching, image indexing and retrieval can potentially be applied to many other areas, including biomedicine, space science, biometric identification, digital libraries, the military, education, commerce, culture and entertainment. Machine Learning and Statistical Modeling Approaches to Image Retrieval describes several approaches of integrating machine learning and statistical modeling into an image retrieval and indexing system that demonstrates promising results. The topics of this book reflect authors' experiences of machine learning and statistical modeling based image indexing and retrieval. This book contains detailed references for further reading and research in this field as well.
Cross-Language Information Retrieval
Author: Jian-Yun Nie
Publisher: Springer Nature
ISBN: 303102138X
Category : Computers
Languages : en
Pages : 125
Book Description
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography
Publisher: Springer Nature
ISBN: 303102138X
Category : Computers
Languages : en
Pages : 125
Book Description
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography