On the Efficient Determination of Most Near Neighbors

On the Efficient Determination of Most Near Neighbors PDF Author: Mark S. Manasse
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80

Get Book Here

Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

On the Efficient Determination of Most Near Neighbors

On the Efficient Determination of Most Near Neighbors PDF Author: Mark S. Manasse
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80

Get Book Here

Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

On the Efficient Determination of Most Near Neighbors

On the Efficient Determination of Most Near Neighbors PDF Author: Mark S. Manasse
Publisher: Morgan & Claypool
ISBN: 9781608450886
Category : Computers
Languages : en
Pages : 72

Get Book Here

Book Description
The material in this book grew from a simple question: "We know how to easily determine whether two files are identical, but what do we know about determining whether two files are similar?" The answer was "Not much," but when a theorist gives this answer, good things often happen. Such was the case here. This book will be important to practitioners interested in this and similar questions. It contains two intertwined threads; a mathematical treatment of the problem and an engineering thread that provides extremely efficient code for obtaining the solution at scale. I recommend it highly.---Charles P. (Chuck) Thacker, Microsoft Research From de-duplication to search, billion dollar industries rely on the ability to search for keys that are "close" to a specified key. The book by Mark Manasse provides a beautiful exposition of the field. Manasse is a well-known expert who has written some of the fundamental theoretical papers in the field; better still, he has worked on real products such as AltaVista and Windows file de-duplication. Mark has the rare ability to take theoretical ideas and convert them to sound engineering. The book will appeal to developers working in the web milieu because it illuminates the details that are often missing using code snippets. It will also appeal to researchers and students because of the uniform and insightful exposition of an important area.---George Varghese, Professor, University of California, San Diego Mark Manasse, the father of micropayments, provides insight, techniques, and theory behind search---on getting not too large, not too small, but just right results. This horseshoes mini-treatise comes right from the horse's mouth as an Alta Vistan---he shows how the game was constructed by high dimensionality mapping into tractable space and time to find ringers and good outliers.---Gordon Bell, Microsoft Research

On The Efficient Determination of Most Near Neighbors

On The Efficient Determination of Most Near Neighbors PDF Author: Mark Manasse
Publisher: Springer Nature
ISBN: 3031022815
Category : Computers
Languages : en
Pages : 80

Get Book Here

Book Description
The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages -- and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

On the Efficient Determination of Most Near Neighbors, 2nd Edition

On the Efficient Determination of Most Near Neighbors, 2nd Edition PDF Author: Mark Manasse
Publisher:
ISBN:
Category :
Languages : en
Pages : 100

Get Book Here

Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages-and a few other situations in which we have found that inexact matching is good enough - where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

Explaining the Success of Nearest Neighbor Methods in Prediction

Explaining the Success of Nearest Neighbor Methods in Prediction PDF Author: George H. Chen
Publisher: Foundations and Trends (R) in Machine Learning
ISBN: 9781680834543
Category :
Languages : en
Pages : 264

Get Book Here

Book Description
Explains the success of Nearest Neighbor Methods in Prediction, both in theory and in practice.

Transforming Technologies to Manage Our Information

Transforming Technologies to Manage Our Information PDF Author: William Jones
Publisher: Springer Nature
ISBN: 3031023293
Category : Computers
Languages : en
Pages : 155

Get Book Here

Book Description
With its theme, "Our Information, Always and Forever," Part I of this book covers the basics of personal information management (PIM) including six essential activities of PIM and six (different) ways in which information can be personal to us. Part I then goes on to explore key issues that arise in the "great migration" of our information onto the Web and into a myriad of mobile devices. Part 2 provides a more focused look at technologies for managing information that promise to profoundly alter our practices of PIM and, through these practices, the way we lead our lives. Part 2 is in five chapters: - Chapter 5. Technologies of Input and Output. Technologies in support of gesture, touch, voice, and even eye movements combine to support a more natural user interface (NUI). Technologies of output include glasses and "watch" watches. Output will also increasingly be animated with options to "zoom". - Chapter 6. Technologies to Save Our Information. We can opt for "life logs" to record our experiences with increasing fidelity. What will we use these logs for? And what isn’t recorded that should be? - Chapter 7. Technologies to Search Our Information. The potential for personalized search is enormous and mostly yet to be realized. Persistent searches, situated in our information landscape, will allow us to maintain a diversity of projects and areas of interest without a need to continually switch from one to another to handle incoming information. - Chapter 8. Technologies to Structure Our Information. Structure is key if we are to keep, find, and make effective use of our information. But how best to structure? And how best to share structured information between the applications we use, with other people, and also with ourselves over time? What lessons can we draw from the failures and successes in web-based efforts to share structure? - Chapter 9. PIM Transformed and Transforming: Stories from the Past, Present and Future. Part 2 concludes with a comparison between Licklider’s world of information in 1957 and our own world of information today. And then we consider what the world of information is likely to look like in 2057. Licklider estimated that he spent 85% of his "thinking time" in activities that were clerical and mechanical and might (someday) be delegated to the computer. What percentage of our own time is spent with the clerical and mechanical? What about in 2057?

Social Monitoring for Public Health

Social Monitoring for Public Health PDF Author: Michael J. Paul
Publisher: Morgan & Claypool Publishers
ISBN: 1681736101
Category : Computers
Languages : en
Pages : 188

Get Book Here

Book Description
Public health thrives on high-quality evidence, yet acquiring meaningful data on a population remains a central challenge of public health research and practice. Social monitoring, the analysis of social media and other user-generated web data, has brought advances in the way we leverage population data to understand health. Social media offers advantages over traditional data sources, including real-time data availability, ease of access, and reduced cost. Social media allows us to ask, and answer, questions we never thought possible. This book presents an overview of the progress on uses of social monitoring to study public health over the past decade. We explain available data sources, common methods, and survey research on social monitoring in a wide range of public health areas. Our examples come from topics such as disease surveillance, behavioral medicine, and mental health, among others. We explore the limitations and concerns of these methods. Our survey of this exciting new field of data-driven research lays out future research directions.

Task Intelligence for Search and Recommendation

Task Intelligence for Search and Recommendation PDF Author: Chirag Shah
Publisher: Springer Nature
ISBN: 3031023269
Category : Computers
Languages : en
Pages : 140

Get Book Here

Book Description
While great strides have been made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information and enabling task completion. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, and it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has unlocked new modalities for interacting with information, but these agents will need to be able to work understanding current and future contexts and assist users at task level. This book will focus on task intelligence in the context of search and recommendation. Chapter 1 introduces readers to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). This is followed by presenting several prominent ideas and frameworks about how tasks are conceptualized and represented in Chapter 2. In Chapter 3, the narrative moves to showing how task type relates to user behaviors and search intentions. A task can be explicitly expressed in some cases, such as in a to-do application, but often it is unexpressed. Chapter 4 covers these two scenarios with several related works and case studies. Chapter 5 shows how task knowledge and task models can contribute to addressing emerging retrieval and recommendation problems. Chapter 6 covers evaluation methodologies and metrics for task-based systems, with relevant case studies to demonstrate their uses. Finally, the book concludes in Chapter 7, with ideas for future directions in this important research area.

Images in Social Media

Images in Social Media PDF Author: Susanne Ørnager
Publisher: Springer Nature
ISBN: 3031023145
Category : Computers
Languages : en
Pages : 101

Get Book Here

Book Description
This book focuses on the methodologies, organization, and communication of digital image collection research that utilizes social media content. ("Image" is here understood as a cultural, conventional, and commercial—stock photo—representation.) The lecture offers expert views that provide different interpretations of images and their potential implementations. Linguistic and semiotic methodologies as well as eye-tracking research are employed to both analyze images and comprehend how humans consider them, including which salient features generally attract viewers' attention. This literature review covers image—specifically photographic—research since 2005, when major social media platforms emerged. A citation analysis includes an overview of co-citation maps that demonstrate the nexus of image research literature and the journals in which they appear. Eye tracking tests whether scholarly templates focus on the proper features of an image, such as people, objects, time, etc., and if a prescribed theme affects the eye movements of the observer. The results may point to renewed requirements for building image search engines. As it stands, image management already requires new algorithms and a new understanding that involves text recognition and very large database processing. The aim of this book is to present different image research areas and demonstrate the challenges image research faces. The book's scope is, by necessity, far from comprehensive, since the field of digital image research does not cover fake news, image manipulation, mobile photos, etc.; these issues are very complex and need a publication of their own. This book should primarily be useful for students in library and information science, psychology, and computer science.

Automatic Disambiguation of Author Names in Bibliographic Repositories

Automatic Disambiguation of Author Names in Bibliographic Repositories PDF Author: Anderson A. Ferreira
Publisher: Springer Nature
ISBN: 3031023226
Category : Computers
Languages : en
Pages : 126

Get Book Here

Book Description
This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.