Automatic Disambiguation of Author Names in Bibliographic Repositories

Automatic Disambiguation of Author Names in Bibliographic Repositories PDF Author: Anderson A. Ferreira
Publisher: Springer Nature
ISBN: 3031023226
Category : Computers
Languages : en
Pages : 126

Get Book Here

Book Description
This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.

Automatic Disambiguation of Author Names in Bibliographic Repositories

Automatic Disambiguation of Author Names in Bibliographic Repositories PDF Author: Anderson A. Ferreira
Publisher: Springer Nature
ISBN: 3031023226
Category : Computers
Languages : en
Pages : 126

Get Book Here

Book Description
This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.

Knowledge Graphs and Semantic Web

Knowledge Graphs and Semantic Web PDF Author: Boris Villazón-Terrazas
Publisher: Springer Nature
ISBN: 3031214226
Category : Computers
Languages : en
Pages : 355

Get Book Here

Book Description
This book constitutes the proceedings of the 4th Iberoamerican Conference and third Indo-American Conference on Knowledge Graphs and Semantic Web, KGSWC 2022, which took place in Madrid, Spain, in November 2022. The 22 full and 3 short research papers presented in this volume were carefully reviewed and selected from 63 submissions. The papers cover topics related to software and its engineering, software creation and management, Emerging technologies, Analysis and design of emerging devices and systems, Emerging tools and methodologies and others.

Understanding and Evaluating Search Experience

Understanding and Evaluating Search Experience PDF Author: Stone Maria
Publisher: Springer Nature
ISBN: 3031792165
Category : Computers
Languages : en
Pages : 87

Get Book Here

Book Description
This book is intended for anyone interested in learning more about how search works and how it is evaluated. We all use search—it's a familiar utility. Yet, few of us stop and think about how search works, what makes search results good, and who, if anyone, decides what good looks like. Search has a long and glorious history, yet it continues to evolve, and with it, the measurement and our understanding of the kinds of experiences search can deliver continues to evolve, as well. We will discuss the basics of how search engines work, how humans use search engines, and how measurement works. Equipped with these general topics, we will then dive into the established ways of measuring search user experience, and their pros and cons. We will talk about collecting labels from human judges, analyzing usage logs, surveying end users, and even touch upon automated evaluation methods. After introducing different ways of collecting metrics, we will cover experimentation as it applies to search evaluation. The book will cover evaluating different aspects of search—from search user interface (UI), to results presentation, to the quality of search algorithms. In covering these topics, we will touch upon many issues in evaluation that became sources of controversy—from user privacy, to ethical considerations, to transparency, to potential for bias. We will conclude by contrasting measuring with understanding, and pondering the future of search evaluation.

Word Association Thematic Analysis

Word Association Thematic Analysis PDF Author: Mike Thelwall
Publisher: Morgan & Claypool Publishers
ISBN: 1636390668
Category : Computers
Languages : en
Pages : 131

Get Book Here

Book Description
This book explains the word association thematic analysis method, with examples, and gives practical advice for using it. It is primarily intended for social media researchers and students, although the method is applicable to any collection of short texts. Many research projects involve analyzing sets of texts from the social web or elsewhere to get insights into issues, opinions, interests, news discussions, or communication styles. For example, many studies have investigated reactions to Covid-19 social distancing restrictions, conspiracy theories, and anti-vaccine sentiment on social media. This book describes word association thematic analysis, a mixed methods strategy to identify themes within a collection of social web or other texts. It identifies these themes in the differences between subsets of the texts, including female vs. male vs. nonbinary, older vs. newer, country A vs. country B, positive vs. negative sentiment, high scoring vs. low scoring, or subtopic A vs. subtopic B. It can also be used to identify the differences between a topic-focused collection of texts and a reference collection. The method starts by automatically finding words that are statistically significantly more common in one subset than another, then identifies the context of these words and groups them into themes. It is supported by the free Windows-based software Mozdeh for data collection or importing and for the quantitative analysis stages.

Question Answering for the Curated Web

Question Answering for the Curated Web PDF Author: Rishiraj Saha Roy
Publisher: Springer Nature
ISBN: 3031795121
Category : Mathematics
Languages : en
Pages : 172

Get Book Here

Book Description
Question answering (QA) systems on the Web try to provide crisp answers to information needs posed in natural language, replacing the traditional ranked list of documents. QA, posing a multitude of research challenges, has emerged as one of the most actively investigated topics in information retrieval, natural language processing, and the artificial intelligence communities today. The flip side of such diverse and active interest is that publications are highly fragmented across several venues in the above communities, making it very difficult for new entrants to the field to get a good overview of the topic. Through this book, we make an attempt towards mitigating the above problem by providing an overview of the state-of-the-art in question answering. We cover the twin paradigms of curated Web sources used in QA tasks ‒ trusted text collections like Wikipedia, and objective information distilled into large-scale knowledge bases. We discuss distinct methodologies that have been applied to solve the QA problem in both these paradigms, using instantiations of recent systems for illustration. We begin with an overview of the problem setup and evaluation, cover notable sub-topics like open-domain, multi-hop, and conversational QA in depth, and conclude with key insights and emerging topics. We believe that this resource is a valuable contribution towards a unified view on QA, helping graduate students and researchers planning to work on this topic in the near future.

Task Intelligence for Search and Recommendation

Task Intelligence for Search and Recommendation PDF Author: Chirag Shah
Publisher: Springer Nature
ISBN: 3031023269
Category : Computers
Languages : en
Pages : 140

Get Book Here

Book Description
While great strides have been made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information and enabling task completion. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, and it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has unlocked new modalities for interacting with information, but these agents will need to be able to work understanding current and future contexts and assist users at task level. This book will focus on task intelligence in the context of search and recommendation. Chapter 1 introduces readers to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). This is followed by presenting several prominent ideas and frameworks about how tasks are conceptualized and represented in Chapter 2. In Chapter 3, the narrative moves to showing how task type relates to user behaviors and search intentions. A task can be explicitly expressed in some cases, such as in a to-do application, but often it is unexpressed. Chapter 4 covers these two scenarios with several related works and case studies. Chapter 5 shows how task knowledge and task models can contribute to addressing emerging retrieval and recommendation problems. Chapter 6 covers evaluation methodologies and metrics for task-based systems, with relevant case studies to demonstrate their uses. Finally, the book concludes in Chapter 7, with ideas for future directions in this important research area.

Word Association Thematic Analysis

Word Association Thematic Analysis PDF Author: Michael Thelwall
Publisher: Springer Nature
ISBN: 3031023242
Category : Computers
Languages : en
Pages : 111

Get Book Here

Book Description
Many research projects involve analyzing sets of texts from the social web or elsewhere to get insights into issues, opinions, interests, news discussions, or communication styles. For example, many studies have investigated reactions to Covid-19 social distancing restrictions, conspiracy theories, and anti-vaccine sentiment on social media. This book describes word association thematic analysis, a mixed methods strategy to identify themes within a collection of social web or other texts. It identifies these themes in the differences between subsets of the texts, including female vs. male vs. nonbinary, older vs. newer, country A vs. country B, positive vs. negative sentiment, high scoring vs. low scoring, or subtopic A vs. subtopic B. It can also be used to identify the differences between a topic-focused collection of texts and a reference collection. The method starts by automatically finding words that are statistically significantly more common in one subset than another, then identifies the context of these words and groups them into themes. It is supported by the free Windows-based software Mozdeh for data collection or importing and for the quantitative analysis stages. This book explains the word association thematic analysis method, with examples, and gives practical advice for using it. It is primarily intended for social media researchers and students, although the method is applicable to any collection of short texts.

Third Space, Information Sharing, and Participatory Design

Third Space, Information Sharing, and Participatory Design PDF Author: Preben Hansen
Publisher: Springer Nature
ISBN: 3031023277
Category : Computers
Languages : en
Pages : 134

Get Book Here

Book Description
Society faces many challenges in workplaces, everyday life situations, and education contexts. Within information behavior research, there are often calls to bridge inclusiveness and for greater collaboration, with user-centered design approaches and, more specifically, participatory design practices. Collaboration and participation are essential in addressing contemporary societal challenges, designing creative information objects and processes, as well as developing spaces for learning, and information and research interventions. The intention is to improve access to information and the benefits to be gained from that. This also applies to bridging the digital divide and for embracing artificial intelligence. With regard to research and practices within information behavior, it is crucial to consider that all users should be involved. Many information activities (i.e., activities falling under the umbrella terms of information behavior and information practices) manifest through participation, and thus, methods such as participatory design may help unfold both information behavior and practices as well as the creation of information objects, new models, and theories. Information sharing is one of its core activities. For participatory design with its value set of democratic, inclusive, and open participation towards innovative practices in a diversity of contexts, it is essential to understand how information activities such as sharing manifest itself. For information behavior studies it is essential to deepen understanding of how information sharing manifests in order to improve access to information and the use of information. Third Space is a physical, virtual, cognitive, and conceptual space where participants may negotiate, reflect, and form new knowledge and worldviews working toward creative, practical and applicable solutions, finding innovative, appropriate research methods, interpreting findings, proposing new theories, recommending next steps, and even designing solutions such as new information objects or services. Information sharing in participatory design manifests in tandem with many other information interaction activities and especially information and cognitive processing. Although there are practices of individual information sharing and information encountering, information sharing mostly relates to collaborative information behavior practices, creativity, and collective decision-making. Our purpose with this book is to enable students, researchers, and practitioners within a multi-disciplinary research field, including information studies and Human–Computer Interaction approaches, to gain a deeper understanding of how the core activity of information sharing in participatory design, in which Third Space may be a platform for information interaction, is taking place when using methods utilized in participatory design to address contemporary societal challenges. This could also apply for information behavior studies using participatory design as methodology. We elaborate interpretations of core concepts such as participatory design, Third Space, information sharing, and collaborative information behavior, before discussing participatory design methods and processes in more depth. We also touch on information behavior, information practice, and other important concepts. Third Space, information sharing, and information interaction are discussed in some detail. A framework, with Third Space as a core intersecting zone, platform, and adaptive and creative space to study information sharing and other information behavior and interactions are suggested. As a tool to envision information behavior and suggest future practices, participatory design serves as a set of methods and tools in which new interpretations of the design of information behavior studies and eventually new information objects are being initiated involving multiple stakeholders in future information landscapes. For this purpose, we argue that Third Space can be used as an intersection zone to study information sharing and other information activities, but more importantly it can serve as a Third Space Information Behavior (TSIB) study framework where participatory design methodology and processes are applied to information behavior research studies and applications such as information objects, systems, and services with recognition of the importance of situated awareness.

Simulating Information Retrieval Test Collections

Simulating Information Retrieval Test Collections PDF Author: David Hawking
Publisher: Springer Nature
ISBN: 3031023234
Category : Computers
Languages : en
Pages : 162

Get Book Here

Book Description
Simulated test collections may find application in situations where real datasets cannot easily be accessed due to confidentiality concerns or practical inconvenience. They can potentially support Information Retrieval (IR) experimentation, tuning, validation, performance prediction, and hardware sizing. Naturally, the accuracy and usefulness of results obtained from a simulation depend upon the fidelity and generality of the models which underpin it. The fidelity of emulation of a real corpus is likely to be limited by the requirement that confidential information in the real corpus should not be able to be extracted from the emulated version. We present a range of methods exploring trade-offs between emulation fidelity and degree of preservation of privacy. We present three different simple types of text generator which work at a micro level: Markov models, neural net models, and substitution ciphers. We also describe macro level methods where we can engineer macro properties of a corpus, giving a range of models for each of the salient properties: document length distribution, word frequency distribution (for independent and non-independent cases), word length and textual representation, and corpus growth. We present results of emulating existing corpora and for scaling up corpora by two orders of magnitude. We show that simulated collections generated with relatively simple methods are suitable for some purposes and can be generated very quickly. Indeed it may sometimes be feasible to embed a simple lightweight corpus generator into an indexer for the purpose of efficiency studies. Naturally, a corpus of artificial text cannot support IR experimentation in the absence of a set of compatible queries. We discuss and experiment with published methods for query generation and query log emulation. We present a proof-of-the-pudding study in which we observe the predictive accuracy of efficiency and effectiveness results obtained on emulated versions of TREC corpora. The study includes three open-source retrieval systems and several TREC datasets. There is a trade-off between confidentiality and prediction accuracy and there are interesting interactions between retrieval systems and datasets. Our tentative conclusion is that there are emulation methods which achieve useful prediction accuracy while providing a level of confidentiality adequate for many applications. Many of the methods described here have been implemented in the open source project SynthaCorpus, accessible at: https://bitbucket.org/davidhawking/synthacorpus/

Trustworthy Communications and Complete Genealogies

Trustworthy Communications and Complete Genealogies PDF Author: Reagan W. Moore
Publisher: Springer Nature
ISBN: 3031023250
Category : Computers
Languages : en
Pages : 139

Get Book Here

Book Description
Genealogies document relationships between persons involved in historical events. Information about the events is parsed from communications from the past. This book explores a way to organize information from multiple communications into a trustworthy representation of a genealogical history of the modern world. The approach defines metrics for evaluating the consistency, correctness, closure, connectivity, completeness, and coherence of a genealogy. The metrics are evaluated using a 312,000-person research genealogy that explores the common ancestors of the royal families of Europe. A major result is that completeness is defined by a genealogy symmetry property driven by two exponential processes, the doubling of the number of potential ancestors each generation, and the rapid growth of lineage coalescence when the number of potential ancestors exceeds the available population. A genealogy expands from an initial root person to a large number of lineages, which then coalesce into a small number of progenitors. Using the research genealogy, candidate progenitors for persons of Western European descent are identified. A unifying ancestry is defined to which historically notable persons can be linked.