Data Stream Algorithms for Large Graphs and High Dimensional Data

Data Stream Algorithms for Large Graphs and High Dimensional Data PDF Author: Hoa Vu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
In contrast to the traditional random access memory computational model where the entire input is available in the working memory, the data stream model only provides sequential access to the input. The data stream model is a natural framework to handle large and dynamic data. In this model, we focus on designing algorithms that use sublinear memory and a small number of passes over the stream. Other desirable properties include fast update time, query time, and post processing time. In this dissertation, we consider different problems in graph theory, combinatorial optimization, and high dimensional data processing. The first part of this dissertation focuses on algorithms for graph theory and combinatorial optimization. We present new results for the problems of finding the densest subgraph, counting the number of triangles, finding max cut with bounded components, and finding the maximum $k$ set coverage. The second part of this dissertation considers problems in high dimensional data streams. In this setting, each stream item consists of multiple coordinates corresponding to different attributes. We consider the problem of testing or learning about the relationships among the attributes, and the problem of finding heavy hitters in subsets of attributes.

Data Stream Algorithms for Large Graphs and High Dimensional Data

Data Stream Algorithms for Large Graphs and High Dimensional Data PDF Author: Hoa Vu
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
In contrast to the traditional random access memory computational model where the entire input is available in the working memory, the data stream model only provides sequential access to the input. The data stream model is a natural framework to handle large and dynamic data. In this model, we focus on designing algorithms that use sublinear memory and a small number of passes over the stream. Other desirable properties include fast update time, query time, and post processing time. In this dissertation, we consider different problems in graph theory, combinatorial optimization, and high dimensional data processing. The first part of this dissertation focuses on algorithms for graph theory and combinatorial optimization. We present new results for the problems of finding the densest subgraph, counting the number of triangles, finding max cut with bounded components, and finding the maximum $k$ set coverage. The second part of this dissertation considers problems in high dimensional data streams. In this setting, each stream item consists of multiple coordinates corresponding to different attributes. We consider the problem of testing or learning about the relationships among the attributes, and the problem of finding heavy hitters in subsets of attributes.

Data Streams

Data Streams PDF Author: S. Muthukrishnan
Publisher: Now Publishers Inc
ISBN: 193301914X
Category : Computers
Languages : en
Pages : 136

Get Book Here

Book Description
In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges.

Mining of Massive Datasets

Mining of Massive Datasets PDF Author: Jure Leskovec
Publisher: Cambridge University Press
ISBN: 1107077230
Category : Computers
Languages : en
Pages : 480

Get Book Here

Book Description
Now in its second edition, this book focuses on practical algorithms for mining data from even the largest datasets.

Data Stream Management

Data Stream Management PDF Author: Minos Garofalakis
Publisher: Springer
ISBN: 354028608X
Category : Computers
Languages : en
Pages : 528

Get Book Here

Book Description
This volume focuses on the theory and practice of data stream management, and the novel challenges this emerging domain poses for data-management algorithms, systems, and applications. The collection of chapters, contributed by authorities in the field, offers a comprehensive introduction to both the algorithmic/theoretical foundations of data streams, as well as the streaming systems and applications built in different domains. A short introductory chapter provides a brief summary of some basic data streaming concepts and models, and discusses the key elements of a generic stream query processing architecture. Subsequently, Part I focuses on basic streaming algorithms for some key analytics functions (e.g., quantiles, norms, join aggregates, heavy hitters) over streaming data. Part II then examines important techniques for basic stream mining tasks (e.g., clustering, classification, frequent itemsets). Part III discusses a number of advanced topics on stream processing algorithms, and Part IV focuses on system and language aspects of data stream processing with surveys of influential system prototypes and language designs. Part V then presents some representative applications of streaming techniques in different domains (e.g., network management, financial analytics). Finally, the volume concludes with an overview of current data streaming products and new application domains (e.g. cloud computing, big data analytics, and complex event processing), and a discussion of future directions in this exciting field. The book provides a comprehensive overview of core concepts and technological foundations, as well as various systems and applications, and is of particular interest to students, lecturers and researchers in the area of data stream management.

Data Streams

Data Streams PDF Author: Charu C. Aggarwal
Publisher: Springer Science & Business Media
ISBN: 0387475346
Category : Computers
Languages : en
Pages : 365

Get Book Here

Book Description
This book primarily discusses issues related to the mining aspects of data streams and it is unique in its primary focus on the subject. This volume covers mining aspects of data streams comprehensively: each contributed chapter contains a survey on the topic, the key ideas in the field for that particular topic, and future research directions. The book is intended for a professional audience composed of researchers and practitioners in industry. This book is also appropriate for advanced-level students in computer science.

Foundations of Data Science

Foundations of Data Science PDF Author: Avrim Blum
Publisher: Cambridge University Press
ISBN: 1108617360
Category : Computers
Languages : en
Pages : 433

Get Book Here

Book Description
This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.

Algorithms for Big Data

Algorithms for Big Data PDF Author: Moran Feldman
Publisher:
ISBN: 9789811204746
Category : Algorithms
Languages : en
Pages :

Get Book Here

Book Description


Massive Graph Analytics

Massive Graph Analytics PDF Author: David A. Bader
Publisher: CRC Press
ISBN: 1000538613
Category : Business & Economics
Languages : en
Pages : 632

Get Book Here

Book Description
"Graphs. Such a simple idea. Map a problem onto a graph then solve it by searching over the graph or by exploring the structure of the graph. What could be easier? Turns out, however, that working with graphs is a vast and complex field. Keeping up is challenging. To help keep up, you just need an editor who knows most people working with graphs, and have that editor gather nearly 70 researchers to summarize their work with graphs. The result is the book Massive Graph Analytics." — Timothy G. Mattson, Senior Principal Engineer, Intel Corp Expertise in massive-scale graph analytics is key for solving real-world grand challenges from healthcare to sustainability to detecting insider threats, cyber defense, and more. This book provides a comprehensive introduction to massive graph analytics, featuring contributions from thought leaders across academia, industry, and government. Massive Graph Analytics will be beneficial to students, researchers, and practitioners in academia, national laboratories, and industry who wish to learn about the state-of-the-art algorithms, models, frameworks, and software in massive-scale graph analytics.

Algorithms For Big Data

Algorithms For Big Data PDF Author: Moran Feldman
Publisher: World Scientific
ISBN: 9811204756
Category : Computers
Languages : en
Pages : 458

Get Book Here

Book Description
This unique volume is an introduction for computer scientists, including a formal study of theoretical algorithms for Big Data applications, which allows them to work on such algorithms in the future. It also serves as a useful reference guide for the general computer science population, providing a comprehensive overview of the fascinating world of such algorithms.To achieve these goals, the algorithmic results presented have been carefully chosen so that they demonstrate the important techniques and tools used in Big Data algorithms, and yet do not require tedious calculations or a very deep mathematical background.

Highly Efficient Data Processing and Learning Over Large Graphs

Highly Efficient Data Processing and Learning Over Large Graphs PDF Author: Qilian Yu
Publisher:
ISBN: 9780438930421
Category :
Languages : en
Pages :

Get Book Here

Book Description
As information technology advances, people get informed through numerous media channels from conventional media to modern social media. Regardless of geographic distances, in online social networks, people are connected, and information spreads at a speed almost faster than ever before. Besides, due to the increasingly developed internet and database techniques, our society enters into the big data era, where the conventional media including news articles, scientific literature, and patent applications, are connected to form dramatically large information networks or graphs. In this dissertation, we focus on proposing highly efficient data processing and learning approaches over graphs in mining users with the maximum influence ability over an online social network.Influence ability is to measure the ability that a user in a social network makes people perform similar actions to what he or she does. Finding people with maximum influence ability is attractive since we can understand how information flows and make use of these people to spread information quickly. In academia, the problem of targeting at a fixed number of users with the maximum influence ability across the network is termed influence maximization. In this dissertation, a novel multi-action credit distribution (mCD) model is introduced, which can quantify the influence ability of each user and work over practical datasets where one type of actions is recorded for multiple times. Based on this model, the influence maximization problem is formulated as a submodular maximization problem under a general knapsack constraint, which is shown to be NP-hard. Instead of directly solving the influence maximization problem, we study a more general case where the constraint is the d-knapsack constraint. Then, we return to the original problem and develop efficient algorithms. For the general submodular maximization problem with a d-knapsack constraint, a streaming algorithm is developed, which achieves a 1/(1+2d)-[epsilon]-approximation of the optimal value, while it only needs one single pass through the dataset without storing all the data in the memory. As a special case of submodular maximization subject to a d-knapsack constraint, the influence maximization problem with a budget constraint can be solved by the proposed streaming algorithm with (1/3-[epsilon]) optimality. To demonstrate the performance of the mCD model, we conduct experiments on the real Twitter dataset. Compared with the independent cascading model and standard credit distribution model, the mCD model enjoys high prediction accuracy. Note that our proposed streaming algorithms provide a more efficient way to solve related combinatorial optimization problems as well. We extensively evaluate the effectiveness of our proposed streaming algorithms via two applications: news recommendation and scientific literature recommendation. It is observed that the proposed streaming algorithms achieve both execution speedup and memory saving by several orders of magnitude, compared with existing approaches.