Scalable Phrase Mining for Ad-hoc Text Analytics

Scalable Phrase Mining for Ad-hoc Text Analytics PDF Author: Srikanta Bedathur
Publisher:
ISBN:
Category : Data mining
Languages : en
Pages : 41

Get Book Here

Book Description
Abstract: "Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. The ad-hoc subset may be derived by means of a keyword query against the corpus, or by focusing on a particular time period. We investigate alternative definitions of phrase interestingness, based on the probability of phrase occurrences. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases on ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles."

Scalable Phrase Mining for Ad-hoc Text Analytics

Scalable Phrase Mining for Ad-hoc Text Analytics PDF Author: Srikanta Bedathur
Publisher:
ISBN:
Category : Data mining
Languages : en
Pages : 41

Get Book Here

Book Description
Abstract: "Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. The ad-hoc subset may be derived by means of a keyword query against the corpus, or by focusing on a particular time period. We investigate alternative definitions of phrase interestingness, based on the probability of phrase occurrences. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases on ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles."

Scalable Phrase Mining for Ad-hoc Text Analytics

Scalable Phrase Mining for Ad-hoc Text Analytics PDF Author: Maya Ramanath
Publisher:
ISBN:
Category :
Languages : en
Pages : 30

Get Book Here

Book Description


Text Analysis Pipelines

Text Analysis Pipelines PDF Author: Henning Wachsmuth
Publisher: Springer
ISBN: 3319257412
Category : Computers
Languages : en
Pages : 317

Get Book Here

Book Description
This monograph proposes a comprehensive and fully automatic approach to designing text analysis pipelines for arbitrary information needs that are optimal in terms of run-time efficiency and that robustly mine relevant information from text of any kind. Based on state-of-the-art techniques from machine learning and other areas of artificial intelligence, novel pipeline construction and execution algorithms are developed and implemented in prototypical software. Formal analyses of the algorithms and extensive empirical experiments underline that the proposed approach represents an essential step towards the ad-hoc use of text mining in web search and big data analytics. Both web search and big data analytics aim to fulfill peoples’ needs for information in an adhoc manner. The information sought for is often hidden in large amounts of natural language text. Instead of simply returning links to potentially relevant texts, leading search and analytics engines have started to directly mine relevant information from the texts. To this end, they execute text analysis pipelines that may consist of several complex information-extraction and text-classification stages. Due to practical requirements of efficiency and robustness, however, the use of text mining has so far been limited to anticipated information needs that can be fulfilled with rather simple, manually constructed pipelines.

Phrase Mining from Massive Text and Its Applications

Phrase Mining from Massive Text and Its Applications PDF Author: Jialu Liu
Publisher: Morgan & Claypool Publishers
ISBN: 1627059180
Category : Computers
Languages : en
Pages : 89

Get Book Here

Book Description
A lot of digital ink has been spilled on "big data" over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications? In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.

Mining Text Data

Mining Text Data PDF Author: Charu C. Aggarwal
Publisher: Springer Science & Business Media
ISBN: 1461432235
Category : Computers
Languages : en
Pages : 527

Get Book Here

Book Description
Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.

Text Data Mining

Text Data Mining PDF Author: Chengqing Zong
Publisher: Springer Nature
ISBN: 9811601003
Category : Computers
Languages : en
Pages : 363

Get Book Here

Book Description
This book discusses various aspects of text data mining. Unlike other books that focus on machine learning or databases, it approaches text data mining from a natural language processing (NLP) perspective. The book offers a detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation and feature selection, to text classification and text clustering. It also presents the predominant applications of text data mining, for example, topic modeling, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and automatic text summarization. Bringing all the related concepts and algorithms together, it offers a comprehensive, authoritative and coherent overview. Written by three leading experts, it is valuable both as a textbook and as a reference resource for students, researchers and practitioners interested in text data mining. It can also be used for classes on text data mining or NLP.

Text Analytics with Python

Text Analytics with Python PDF Author: Dipanjan Sarkar
Publisher: Apress
ISBN: 1484223888
Category : Computers
Languages : en
Pages : 397

Get Book Here

Book Description
Derive useful insights from your data using Python. You will learn both basic and advanced concepts, including text and language syntax, structure, and semantics. You will focus on algorithms and techniques, such as text classification, clustering, topic modeling, and text summarization. Text Analytics with Python teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems. What You Will Learn: Understand the major concepts and techniques of natural language processing (NLP) and text analytics, including syntax and structure Build a text classification system to categorize news articles, analyze app or game reviews using topic modeling and text summarization, and cluster popular movie synopses and analyze the sentiment of movie reviews Implement Python and popular open source libraries in NLP and text analytics, such as the natural language toolkit (nltk), gensim, scikit-learn, spaCy and Pattern Who This Book Is For : IT professionals, analysts, developers, linguistic experts, data scientists, and anyone with a keen interest in linguistics, analytics, and generating insights from textual data

The Text Mining Handbook

The Text Mining Handbook PDF Author: Ronen Feldman
Publisher: Cambridge University Press
ISBN: 0521836573
Category : Computers
Languages : en
Pages : 423

Get Book Here

Book Description
Publisher description

Encyclopedia of Information Science and Technology, Third Edition

Encyclopedia of Information Science and Technology, Third Edition PDF Author: Khosrow-Pour, Mehdi
Publisher: IGI Global
ISBN: 1466658894
Category : Computers
Languages : en
Pages : 7972

Get Book Here

Book Description
"This 10-volume compilation of authoritative, research-based articles contributed by thousands of researchers and experts from all over the world emphasized modern issues and the presentation of potential opportunities, prospective solutions, and future directions in the field of information science and technology"--Provided by publisher.

Opinion Mining and Sentiment Analysis

Opinion Mining and Sentiment Analysis PDF Author: Bo Pang
Publisher: Now Publishers Inc
ISBN: 1601981503
Category : Data mining
Languages : en
Pages : 149

Get Book Here

Book Description
This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems.