Discretization Based on Clustering

Discretization Based on Clustering PDF Author: Venkata Dileep Chanda
Publisher:
ISBN:
Category :
Languages : en
Pages : 128

Get Book Here

Book Description
A huge amount of data is being collected everyday resulting in the need of innovative techniques to extract useful information from those data. Data mining is the technique to analyze the data to find non-trivial relationships among the data objects and discover unknown information. In data mining, clustering is one of useful techniques for discovering interesting information in the underlying data objects. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. Clustering has been studied extensively in the past and many clustering algorithms have been developed. In this thesis, we propose a new clustering technique, which is based on a hierarchical agglomerative approach. This new agglomerative discretization strategy, called discretization based on clustering, is different from the existing methods, particularly in its termination step. Data clustering is used to discretize attributes with continuous numeric values, i.e., group continuous numeric values of attributes into numeric ranges (or clusters). The clustering is done depending upon a pre-determined decision attribute. Then the knowledge about the input dataset is extracted and represented using the if/then decision rules.

Discretization Based on Clustering

Discretization Based on Clustering PDF Author: Venkata Dileep Chanda
Publisher:
ISBN:
Category :
Languages : en
Pages : 128

Get Book Here

Book Description
A huge amount of data is being collected everyday resulting in the need of innovative techniques to extract useful information from those data. Data mining is the technique to analyze the data to find non-trivial relationships among the data objects and discover unknown information. In data mining, clustering is one of useful techniques for discovering interesting information in the underlying data objects. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. Clustering has been studied extensively in the past and many clustering algorithms have been developed. In this thesis, we propose a new clustering technique, which is based on a hierarchical agglomerative approach. This new agglomerative discretization strategy, called discretization based on clustering, is different from the existing methods, particularly in its termination step. Data clustering is used to discretize attributes with continuous numeric values, i.e., group continuous numeric values of attributes into numeric ranges (or clusters). The clustering is done depending upon a pre-determined decision attribute. Then the knowledge about the input dataset is extracted and represented using the if/then decision rules.

Clustering Stability

Clustering Stability PDF Author: Ulrike Von Luxburg
Publisher: Now Publishers Inc
ISBN: 1601983441
Category : Computers
Languages : en
Pages : 53

Get Book Here

Book Description
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are most stable. In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we give a high-level overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications.

Data Clustering

Data Clustering PDF Author: Charu C. Aggarwal
Publisher: CRC Press
ISBN: 1466558229
Category : Business & Economics
Languages : en
Pages : 648

Get Book Here

Book Description
Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering process—including how to verify the quality of the underlying clusters—through supervision, human intervention, or the automated generation of alternative clusters.

Clustering: Theoretical And Practical Aspects

Clustering: Theoretical And Practical Aspects PDF Author: Dan A Simovici
Publisher: World Scientific
ISBN: 981124121X
Category : Computers
Languages : en
Pages : 882

Get Book Here

Book Description
This unique compendium gives an updated presentation of clustering, one of the most challenging tasks in machine learning. The book provides a unitary presentation of classical and contemporary algorithms ranging from partitional and hierarchical clustering up to density-based clustering, clustering of categorical data, and spectral clustering.Most of the mathematical background is provided in appendices, highlighting algebraic and complexity theory, in order to make this volume as self-contained as possible. A substantial number of exercises and supplements makes this a useful reference textbook for researchers and students.

Discretize and Conquer

Discretize and Conquer PDF Author: Soheil Soltani
Publisher:
ISBN:
Category : Cluster analysis
Languages : en
Pages : 127

Get Book Here

Book Description
Clustering is one of the most fundamental tasks in many machine learning and information retrieval applications. Roughly speaking, the goal is to partition data instances such that similar instances end up in the same group while dissimilar instances lie in different groups. Quite surprisingly though, the formal and rigorous definition of clustering is not at all clear mainly because there is no consensus about what constitutes a cluster. That said, across all disciplines, from mathematics and statistics to genetics, people frequently try to get a first intuition about the data through identifying meaningful groups. Finding similar instances and grouping them are two main steps in clustering, and not surprisingly, both have been the subject of extensive study over recent decades. It has been shown that using large datasets is the key to achieving acceptable levels of performance in data-driven applications. Today, the Internet is a vast resource for such datasets, each of which contains millions and billions of high-dimensional items such as images and text documents. However, for such large-scale datasets, the performance of the employed machine-learning algorithm quickly becomes the main bottleneck. Conventional clustering algorithms are no exception, and a great deal of effort has been devoted to developing scalable clustering algorithms. Clustering tasks can vary both in terms of the input they have and the output that they are expected to generate. For instance, the input of a clustering algorithm can hold various types of data such as continuous numerical, and categorical types. This thesis on a particular setting; in it, the input instances are represented with binary strings. Binary representation has several advantages such as storage efficiency, simplicity, lack of a numerical-data-like concept of noise, and being naturally normalized. The literature abounds with applications of clustering binary data, such as in marketing, document clustering, and image clustering. As a more-concrete example, in marketing for an online store, each customer's basket is a binary representation of items. By clustering customers, the store can recommend items to customers with the same interests. In document clustering, documents can be represented as binary codes in which each element indicates whether a word exists in the document or not. Another notable application of binary codes is in binary hashing, which has been the topic of significant research in the last decade. The goal of binary hashing is to encode high-dimensional items, such as images, with compact binary strings so as to preserve a given notion of similarity. Such codes enable extremely fast nearest neighbour searches, as the distance between two codes (often the Hamming distance) can be computed quickly using bit-wise operations implemented at the hardware level. Similar to other types of data, the clustering of binary datasets has witnessed considerable research recently. Unfortunately, most of the existing approaches are only concerned with devising density and centroid-based clustering algorithms, even though many other types of clustering techniques can be applied to binary data. One of the most popular and intuitive algorithms in connectivity-based clustering is the Hierarchical Agglomerative Clustering (HAC) algorithm, which is based on the core idea of objects being more related to nearby objects than to objects farther away. As the name suggests, HAC is a family of clustering methods that return a dendrogram as their output: that is, a hierarchical tree of domain subsets, having a singleton instance in their leaves and the whole data instances in their root. Such algorithms need no prior knowledge about the number of clusters. Most of them are deterministic and applicable to different cluster shapes, but these advantages come at the price of high computational and storage costs in comparison with other popular clustering algorithms such as k-means. In this thesis, a family of HAC algorithms is proposed, called Discretized Agglomerative Clustering (DAC), that is designed to work with binary data. By leveraging the discretized and bounded nature of binary representation, the proposed algorithms can achieve significant speedup factors both in theory and practice, in comparison to the existing solutions. From the theoretical perspective, DAC algorithms can reduce the computational cost of hierarchical clustering from cubic to quadratic, matching the known lower bounds for HAC. The proposed approach is also be empirically compared with other well-known clustering algorithms such as k-means, DBSCAN, average, and complete-linkage HAC, on well-known datasets such as TEXMEX, CIFAR-10 and MNIST, which are among the standard benchmarks for large-scale algorithms. Results indicate that by mapping real points to binary vectors using existing binary hashing algorithms and clustering them with DAC, one can achieve several orders of magnitude speed without losing much clustering quality, and in some cases, achieving even more.

Clustering

Clustering PDF Author: Rui Xu
Publisher: Wiley-IEEE Press
ISBN:
Category : Computers
Languages : en
Pages : 374

Get Book Here

Book Description
This is the first book to take a truly comprehensive look at clustering. It begins with an introduction to cluster analysis and goes on to explore: proximity measures; hierarchical clustering; partition clustering; neural network-based clustering; kernel-based clustering; sequential data clustering; large-scale data clustering; data visualization and high-dimensional data clustering; and cluster validation. The authors assume no previous background in clustering and their generous inclusion of examples and references help make the subject matter comprehensible for readers of varying levels and backgrounds.

Classification, Clustering, and Data Mining Applications

Classification, Clustering, and Data Mining Applications PDF Author: International Federation of Classification Societies. Conference
Publisher: Springer Science & Business Media
ISBN: 9783540220145
Category : Computers
Languages : en
Pages : 976

Get Book Here

Book Description
Modern data analysis stands at the interface of statistics, computer science, and discrete mathematics. This volume describes new methods in this area, with special emphasis on classification and cluster analysis. Those methods are applied to problems in information retrieval, phylogeny, medical diagnosis, microarrays, and other active research areas.

Projection-Based Clustering Through Self-Organization and Swarm Intelligence

Projection-Based Clustering Through Self-Organization and Swarm Intelligence PDF Author: Michael Christoph Thrun
Publisher:
ISBN: 9781013269905
Category : Computers
Languages : en
Pages : 210

Get Book Here

Book Description
This book covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures.The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining. This work was published by Saint Philip Street Press pursuant to a Creative Commons license permitting commercial use. All rights not granted by the work's license are retained by the author or authors.

Efficient Algorithms for Clustering and Classifying High Dimensional Text and Discretized Data Using Interesting Patterns

Efficient Algorithms for Clustering and Classifying High Dimensional Text and Discretized Data Using Interesting Patterns PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Efficient algorithms for clustering and classifying high dimensional text and discretized data using interesting patterns.

Kongl. maj:ts befallningshafwande i Westerbottens län till kongl. maj:t afgifne fem års berättelse år 1822. Stockholm, tryckt i Marquardska boktryckeriet 1823. ( [Ins.:] Stockholm, tryckt hos Samuel Rumstedt, 1823.).

Kongl. maj:ts befallningshafwande i Westerbottens län till kongl. maj:t afgifne fem års berättelse år 1822. Stockholm, tryckt i Marquardska boktryckeriet 1823. ( [Ins.:] Stockholm, tryckt hos Samuel Rumstedt, 1823.). PDF Author:
Publisher:
ISBN:
Category :
Languages : en
Pages : 16

Get Book Here

Book Description