On the Efficient Determination of Most Near Neighbors

On the Efficient Determination of Most Near Neighbors PDF Author: Mark S. Manasse
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80

Get Book

Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

On the Efficient Determination of Most Near Neighbors

On the Efficient Determination of Most Near Neighbors PDF Author: Mark S. Manasse
Publisher: Springer Nature
ISBN: 3031022963
Category : Computers
Languages : en
Pages : 80

Get Book

Book Description
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages—and a few other situations in which we have found that inexact matching is good enough — where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

On The Efficient Determination of Most Near Neighbors

On The Efficient Determination of Most Near Neighbors PDF Author: Mark Manasse
Publisher: Springer Nature
ISBN: 3031022815
Category : Computers
Languages : en
Pages : 80

Get Book

Book Description
The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages -- and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

On the Efficient Determination of Most Near Neighbors

On the Efficient Determination of Most Near Neighbors PDF Author: Mark S. Manasse
Publisher: Morgan & Claypool
ISBN: 9781608450886
Category : Computers
Languages : en
Pages : 72

Get Book

Book Description
The material in this book grew from a simple question: "We know how to easily determine whether two files are identical, but what do we know about determining whether two files are similar?" The answer was "Not much," but when a theorist gives this answer, good things often happen. Such was the case here. This book will be important to practitioners interested in this and similar questions. It contains two intertwined threads; a mathematical treatment of the problem and an engineering thread that provides extremely efficient code for obtaining the solution at scale. I recommend it highly.---Charles P. (Chuck) Thacker, Microsoft Research From de-duplication to search, billion dollar industries rely on the ability to search for keys that are "close" to a specified key. The book by Mark Manasse provides a beautiful exposition of the field. Manasse is a well-known expert who has written some of the fundamental theoretical papers in the field; better still, he has worked on real products such as AltaVista and Windows file de-duplication. Mark has the rare ability to take theoretical ideas and convert them to sound engineering. The book will appeal to developers working in the web milieu because it illuminates the details that are often missing using code snippets. It will also appeal to researchers and students because of the uniform and insightful exposition of an important area.---George Varghese, Professor, University of California, San Diego Mark Manasse, the father of micropayments, provides insight, techniques, and theory behind search---on getting not too large, not too small, but just right results. This horseshoes mini-treatise comes right from the horse's mouth as an Alta Vistan---he shows how the game was constructed by high dimensionality mapping into tractable space and time to find ringers and good outliers.---Gordon Bell, Microsoft Research

Social Monitoring for Public Health

Social Monitoring for Public Health PDF Author: Michael J. Paul
Publisher: Morgan & Claypool Publishers
ISBN: 1681736101
Category : Computers
Languages : en
Pages : 158

Get Book

Book Description
Public health thrives on high-quality evidence, yet acquiring meaningful data on a population remains a central challenge of public health research and practice. Social monitoring, the analysis of social media and other user-generated web data, has brought advances in the way we leverage population data to understand health. Social media offers advantages over traditional data sources, including real-time data availability, ease of access, and reduced cost. Social media allows us to ask, and answer, questions we never thought possible. This book presents an overview of the progress on uses of social monitoring to study public health over the past decade. We explain available data sources, common methods, and survey research on social monitoring in a wide range of public health areas. Our examples come from topics such as disease surveillance, behavioral medicine, and mental health, among others. We explore the limitations and concerns of these methods. Our survey of this exciting new field of data-driven research lays out future research directions.

Quantifying Research Integrity

Quantifying Research Integrity PDF Author: Michael Seadle
Publisher: Springer Nature
ISBN: 3031023064
Category : Computers
Languages : en
Pages : 121

Get Book

Book Description
Institutions typically treat research integrity violations as black and white, right or wrong. The result is that the wide range of grayscale nuances that separate accident, carelessness, and bad practice from deliberate fraud and malpractice often get lost. This lecture looks at how to quantify the grayscale range in three kinds of research integrity violations: plagiarism, data falsification, and image manipulation. Quantification works best with plagiarism, because the essential one-to-one matching algorithms are well known and established tools for detecting when matches exist. Questions remain, however, of how many matching words of what kind in what location in which discipline constitute reasonable suspicion of fraudulent intent. Different disciplines take different perspectives on quantity and location. Quantification is harder with data falsification, because the original data are often not available, and because experimental replication remains surprisingly difficult. The same is true with image manipulation, where tools exist for detecting certain kinds of manipulations, but where the tools are also easily defeated. This lecture looks at how to prevent violations of research integrity from a pragmatic viewpoint, and at what steps can institutions and publishers take to discourage problems beyond the usual ethical admonitions. There are no simple answers, but two measures can help: the systematic use of detection tools and requiring original data and images. These alone do not suffice, but they represent a start. The scholarly community needs a better awareness of the complexity of research integrity decisions. Only an open and wide-spread international discussion can bring about a consensus on where the boundary lines are and when grayscale problems shade into black. One goal of this work is to move that discussion forward.

Researching Serendipity in Digital Information Environments

Researching Serendipity in Digital Information Environments PDF Author: Lori McCay-Peet
Publisher: Springer Nature
ISBN: 3031023129
Category : Computers
Languages : en
Pages : 91

Get Book

Book Description
Chance, luck, and good fortune are the usual go-to descriptors of serendipity, a phenomenon aptly often coupled with famous anecdotes of accidental discoveries in engineering and science in modern history such as penicillin, Teflon, and Post-it notes. Serendipity, however, is evident in many fields of research, in organizations, in everyday life—and there is more to it than luck implies. While the phenomenon is strongly associated with in person interactions with people, places, and things, most attention of late has focused on its preservation and facilitation within digital information environments. Serendipity's association with unexpected, positive user experiences and outcomes has spurred an interest in understanding both how current digital information environments support serendipity and how novel approaches may be developed to facilitate it. Research has sought to understand serendipity, how it is manifested in people's personality traits and behaviors, how it may be facilitated in digital information environments such as mobile applications, and its impacts on an individual, an organizational, and a wider level. Because serendipity is expressed and understood in different ways in different contexts, multiple methods have been used to study the phenomenon and evaluate digital information environments that may support it. This volume brings together different disciplinary perspectives and examines the motivations for studying serendipity, the various ways in which serendipity has been approached in the research, methodological approaches to build theory, and how it may be facilitated. Finally, a roadmap for serendipity research is drawn by integrating key points from this volume to produce a framework for the examination of serendipity in digital information environments.

Digital Libraries for Cultural Heritage

Digital Libraries for Cultural Heritage PDF Author: Tatjana Aparac-Jelušić
Publisher: Springer Nature
ISBN: 3031023102
Category : Computers
Languages : en
Pages : 175

Get Book

Book Description
European digital libraries have existed in diverse forms and with quite different functions, priorities, and aims. However, there are some common features of European-based initiatives that are relevant to non-European communities. There are now many more challenges and changes than ever before, and the development rate of new digital libraries is ever accelerating. Delivering educational, cultural, and research resources-especially from major scientific and cultural organizations-has become a core mission of these organizations. Using these resources they will be able to investigate, educate, and elucidate, in order to promote and disseminate and to preserve civilization. Extremely important in conceptualizing the digital environment priorities in Europe was its cultural heritage and the feeling that these rich resources should be open to Europe and the global community. In this book we focus on European digitized heritage and digital culture, and its potential in the digital age. We specifically look at the EU and its approaches to digitization and digital culture, problems detected, and achievements reached, all with an emphasis on digital cultural heritage. We seek to report on important documents that were prepared on digitization; copyright and related documents; research and education in the digital libraries field under the auspices of the EU; some other European and national initiatives; and funded projects. The aim of this book is to discuss the development of digital libraries in the European context by presenting, primarily to non-European communities interested in digital libraries, the phenomena, initiatives, and developments that dominated in Europe. We describe the main projects and their outcomes, and shine a light on the number of challenges that have been inspiring new approaches, cooperative efforts, and the use of research methodology at different stages of the digital libraries development. The specific goals are reflected in the structure of the book, which can be conceived as a guide to several main topics and sub-topics. However, the author’s scope is far from being comprehensive, since the field of digital libraries is very complex and digital libraries for cultural heritage is even moreso.

Framing Privacy in Digital Collections with Ethical Decision Making

Framing Privacy in Digital Collections with Ethical Decision Making PDF Author: Virginia Dressler
Publisher: Morgan & Claypool Publishers
ISBN: 1681734028
Category : Computers
Languages : en
Pages : 109

Get Book

Book Description
As digital collections continue to grow, the underlying technologies to serve up content also continue to expand and develop. As such, new challenges are presented whichcontinue to test ethical ideologies in everyday environs of the practitioner. There are currently no solid guidelines or overarching codes of ethics to address such issues. The digitization of modern archival collections, in particular, presents interesting conundrums when factors of privacy are weighed and reviewed in both small and mass digitization initiatives. Ethical decision making needs to be present at the onset of project planning in digital projects of all sizes, and we also need to identify the role and responsibility of the practitioner to make more virtuous decisions on behalf of those with no voice or awareness of potential privacy breaches. In this book, notions of what constitutes private information are discussed, as is the potential presence of such information in both analog and digital collections. This book lays groundwork to introduce the topic of privacy within digital collections by providing some examples from documented real-world scenarios and making recommendations for future research. A discussion of the notion privacy as concept will be included, as well as some historical perspective (with perhaps one the most cited work on this topic, for example, Warren and Brandeis' "Right to Privacy," 1890). Concepts from the The Right to Be Forgotten case in 2014 (Google Spain SL, Google Inc. v Agencia Española de Protección de Datos, Mario Costeja González) are discussed as to how some lessons may be drawn from the response in Europe and also how European data privacy laws have been applied. The European ideologies are contrasted with the Right to Free Speech in the First Amendment in the U.S., highlighting the complexities in setting guidelines and practices revolving around privacy issues when applied to real life scenarios. Two ethical theories are explored: Consequentialism and Deontological. Finally, ethical decision making models will also be applied to our framework of digital collections. Three case studies are presented to illustrate how privacy can be defined within digital collections in some real-world examples.

Information and Human Values

Information and Human Values PDF Author: Kenneth Fleischmann
Publisher: Morgan & Claypool Publishers
ISBN: 1627052461
Category : Computers
Languages : en
Pages : 101

Get Book

Book Description
This book seeks to advance our understanding of the relationship between information and human values by synthesizing the complementary but typically disconnected threads in the literature, reflecting on my 15 years of research on the relationship between information and human values, advancing our intellectual understanding of the key facets of this topic, and encouraging further research to continue exploring this important and timely research topic. The book begins with an explanation of what human values are and why they are important. Next, three distinct literatures on values, information, and technology are analyzed and synthesized, including the social psychology literature on human values, the information studies literature on the core values of librarianship, and the human-computer interaction literature on value-sensitive design. After that, three detailed case studies are presented based on reflections on a wide range of research studies. The first case study focuses on the role of human values in the design and use of educational simulations. The second case study focuses on the role of human values in the design and use of computational models. The final case study explores human values in communication via, about, or using information technology. The book concludes by laying out a values and design cycle for studying values in information and presenting an agenda for further research.

The Taxobook

The Taxobook PDF Author: Marjorie M.K. Hlava
Publisher: Springer Nature
ISBN: 3031022904
Category : Computers
Languages : en
Pages : 130

Get Book

Book Description
This book is the third of a three-part series on taxonomies, and covers putting your taxonomy into use in as many ways as possible to maximize retrieval for your users. Chapter 1 suggests several items to research and consider before you start your implementation and integration process. It explores the different pieces of software that you will need for your system and what features to look for in each. Chapter 2 launches with a discussion of how taxonomy terms can be used within a workflow, connecting two—or more—taxonomies, and intelligent coordination of platforms and taxonomies. Microsoft SharePoint is a widely used and popular program, and I consider their use of taxonomies in this chapter. Following that is a discussion of taxonomies and semantic integration and then the relationship between indexing and the hierarchy of a taxonomy. Chapter 3 (“How is a Taxonomy Connected to Search?”) provides discussions and examples of putting taxonomies into use in practical applications. It discusses displaying content based on search, how taxonomy is connected to search, using a taxonomy to guide a searcher, tools for search, including search engines, crawlers and spiders, and search software, the parts of a search-capable system, and then how to assemble that search-capable system. This chapter also examines how to measure quality in search, the different kinds of search, and theories on search from several famous theoreticians—two from the 18th and 19th centuries, and two contemporary. Following that is a section on inverted files, parsing, discovery, and clustering. While you probably don’t need a comprehensive understanding of these concepts to build a solid, workable system, enough information is provided for the reader to see how they fit into the overall scheme. This chapter concludes with a look at faceted search and some possibilities for search interfaces. Chapter 4, “Implementing a Taxonomy in a Database or on a Website,” starts where many content systems really should—with the authors, or at least the people who create the content. This chapter discusses matching up various groups of related data to form connections, data visualization and text analytics, and mobile and e-commerce applications for taxonomies. Finally, Chapter 5 presents some educated guesses about the future of knowledge organization. Table of Contents: List of Figures / Preface / Acknowledgments / On Your Mark, Get Ready .... WAIT! Things to Know Before You Start the Implementation Step / Taxonomy and Thesaurus Implementation / How is a Taxonomy Connected to Search? / Implementing a Taxonomy in a Database or on a Website / What Lies Ahead for Knowledge Organization? / Glossary / End Notes / Author Biography