Surprising Empirical Phenomena of Deep Learning and Kernel Machines

Surprising Empirical Phenomena of Deep Learning and Kernel Machines PDF Author: Like Hui
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Over the past decade, the field of machine learning has witnessed significant advancements in artificial intelligence, primarily driven by empirical research. Within this context, we present various surprising empirical phenomena observed in deep learning and kernel machines. Among the crucial components of a learning system, the training objective holds immense importance. In the realm of classification tasks, the cross-entropy loss has emerged as the dominant choice for training modern neural architectures, widely believed to offer empirical superiority over the square loss. However, limited compelling empirical or theoretical evidence exists to firmly establish the clear-cut advantage of the cross-entropy loss. In fact, our findings demonstrate that training with the square loss achieves comparable or even better results than the cross-entropy loss, even when computational resources are equalized. However, it remains unclear how the rescaling hyperparameter R, needs to vary with the number of classes. We provide an exact analysis for a 1-layer ReLU network in the proportional asymptotic regime for isotropic Gaussian data. Specifically, we focus on the optimal choice of R as a function of (i) the number of classes, (ii) the degree of overparameterization, and (iii) the level of label noise. Finally, we provide empirical results on real data, which supports our theoretical predictions. Afterwards, to avoid extra parameters brought by the rescaling of the square loss (in cases when class number is large), later on we propose the "squentropy" loss, which is the sum of the cross-entropy loss and the average square loss over the incorrect classes. We show that the squentropy loss outperforms both the pure cross entropy and rescaled square losses interms of the classification accuracy and model calibration. Also, squentropy loss is a simple "plug-and-play" replacement of cross-entropy as it requires no extra hyperparameters and no extra tuning on optimization parameters. Also, we apply theoretically well-understood kernel machines to practical challenging tasks, speech enhancement, and found that kernel machines actually outperform fully connected networks and require less computation resources. In another work, we investigate the correlation between the Neural Collapse phenomenon proposed by Papyan, Han, & Donoho (2020) and generalization in deep learning. We give precise definitions and their corresponding feasibility on generalization, which clarify neural collapse concepts. Moreover, our empirical evidence supports our claim that neural collapse is mainly an optimization phenomenon.

Surprising Empirical Phenomena of Deep Learning and Kernel Machines

Surprising Empirical Phenomena of Deep Learning and Kernel Machines PDF Author: Like Hui
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Over the past decade, the field of machine learning has witnessed significant advancements in artificial intelligence, primarily driven by empirical research. Within this context, we present various surprising empirical phenomena observed in deep learning and kernel machines. Among the crucial components of a learning system, the training objective holds immense importance. In the realm of classification tasks, the cross-entropy loss has emerged as the dominant choice for training modern neural architectures, widely believed to offer empirical superiority over the square loss. However, limited compelling empirical or theoretical evidence exists to firmly establish the clear-cut advantage of the cross-entropy loss. In fact, our findings demonstrate that training with the square loss achieves comparable or even better results than the cross-entropy loss, even when computational resources are equalized. However, it remains unclear how the rescaling hyperparameter R, needs to vary with the number of classes. We provide an exact analysis for a 1-layer ReLU network in the proportional asymptotic regime for isotropic Gaussian data. Specifically, we focus on the optimal choice of R as a function of (i) the number of classes, (ii) the degree of overparameterization, and (iii) the level of label noise. Finally, we provide empirical results on real data, which supports our theoretical predictions. Afterwards, to avoid extra parameters brought by the rescaling of the square loss (in cases when class number is large), later on we propose the "squentropy" loss, which is the sum of the cross-entropy loss and the average square loss over the incorrect classes. We show that the squentropy loss outperforms both the pure cross entropy and rescaled square losses interms of the classification accuracy and model calibration. Also, squentropy loss is a simple "plug-and-play" replacement of cross-entropy as it requires no extra hyperparameters and no extra tuning on optimization parameters. Also, we apply theoretically well-understood kernel machines to practical challenging tasks, speech enhancement, and found that kernel machines actually outperform fully connected networks and require less computation resources. In another work, we investigate the correlation between the Neural Collapse phenomenon proposed by Papyan, Han, & Donoho (2020) and generalization in deep learning. We give precise definitions and their corresponding feasibility on generalization, which clarify neural collapse concepts. Moreover, our empirical evidence supports our claim that neural collapse is mainly an optimization phenomenon.

Geometry of Deep Learning

Geometry of Deep Learning PDF Author: Jong Chul Ye
Publisher: Springer Nature
ISBN: 9811660468
Category : Mathematics
Languages : en
Pages : 338

Get Book Here

Book Description
The focus of this book is on providing students with insights into geometry that can help them understand deep learning from a unified perspective. Rather than describing deep learning as an implementation technique, as is usually the case in many existing deep learning books, here, deep learning is explained as an ultimate form of signal processing techniques that can be imagined. To support this claim, an overview of classical kernel machine learning approaches is presented, and their advantages and limitations are explained. Following a detailed explanation of the basic building blocks of deep neural networks from a biological and algorithmic point of view, the latest tools such as attention, normalization, Transformer, BERT, GPT-3, and others are described. Here, too, the focus is on the fact that in these heuristic approaches, there is an important, beautiful geometric structure behind the intuition that enables a systematic understanding. A unified geometric analysis to understand the working mechanism of deep learning from high-dimensional geometry is offered. Then, different forms of generative models like GAN, VAE, normalizing flows, optimal transport, and so on are described from a unified geometric perspective, showing that they actually come from statistical distance-minimization problems. Because this book contains up-to-date information from both a practical and theoretical point of view, it can be used as an advanced deep learning textbook in universities or as a reference source for researchers interested in acquiring the latest deep learning algorithms and their underlying principles. In addition, the book has been prepared for a codeshare course for both engineering and mathematics students, thus much of the content is interdisciplinary and will appeal to students from both disciplines.

Applied Deep Learning

Applied Deep Learning PDF Author: Umberto Michelucci
Publisher: Apress
ISBN: 1484237900
Category : Computers
Languages : en
Pages : 425

Get Book Here

Book Description
Work with advanced topics in deep learning, such as optimization algorithms, hyper-parameter tuning, dropout, and error analysis as well as strategies to address typical problems encountered when training deep neural networks. You’ll begin by studying the activation functions mostly with a single neuron (ReLu, sigmoid, and Swish), seeing how to perform linear and logistic regression using TensorFlow, and choosing the right cost function. The next section talks about more complicated neural network architectures with several layers and neurons and explores the problem of random initialization of weights. An entire chapter is dedicated to a complete overview of neural network error analysis, giving examples of solving problems originating from variance, bias, overfitting, and datasets coming from different distributions. Applied Deep Learning also discusses how to implement logistic regression completely from scratch without using any Python library except NumPy, to let you appreciate how libraries such as TensorFlow allow quick and efficient experiments. Case studies for each method are included to put into practice all theoretical information. You’ll discover tips and tricks for writing optimized Python code (for example vectorizing loops with NumPy). What You Will Learn Implement advanced techniques in the right way in Python and TensorFlow Debug and optimize advanced methods (such as dropout and regularization) Carry out error analysis (to realize if one has a bias problem, a variance problem, a data offset problem, and so on) Set up a machine learning project focused on deep learning on a complex dataset Who This Book Is For Readers with a medium understanding of machine learning, linear algebra, calculus, and basic Python programming.

Empirical Asset Pricing

Empirical Asset Pricing PDF Author: Wayne Ferson
Publisher: MIT Press
ISBN: 0262039370
Category : Business & Economics
Languages : en
Pages : 497

Get Book Here

Book Description
An introduction to the theory and methods of empirical asset pricing, integrating classical foundations with recent developments. This book offers a comprehensive advanced introduction to asset pricing, the study of models for the prices and returns of various securities. The focus is empirical, emphasizing how the models relate to the data. The book offers a uniquely integrated treatment, combining classical foundations with more recent developments in the literature and relating some of the material to applications in investment management. It covers the theory of empirical asset pricing, the main empirical methods, and a range of applied topics. The book introduces the theory of empirical asset pricing through three main paradigms: mean variance analysis, stochastic discount factors, and beta pricing models. It describes empirical methods, beginning with the generalized method of moments (GMM) and viewing other methods as special cases of GMM; offers a comprehensive review of fund performance evaluation; and presents selected applied topics, including a substantial chapter on predictability in asset markets that covers predicting the level of returns, volatility and higher moments, and predicting cross-sectional differences in returns. Other chapters cover production-based asset pricing, long-run risk models, the Campbell-Shiller approximation, the debate on covariance versus characteristics, and the relation of volatility to the cross-section of stock returns. An extensive reference section captures the current state of the field. The book is intended for use by graduate students in finance and economics; it can also serve as a reference for professionals.

Understanding Machine Learning

Understanding Machine Learning PDF Author: Shai Shalev-Shwartz
Publisher: Cambridge University Press
ISBN: 1107057132
Category : Computers
Languages : en
Pages : 415

Get Book Here

Book Description
Introduces machine learning and its algorithmic paradigms, explaining the principles behind automated learning approaches and the considerations underlying their usage.

The Principles of Deep Learning Theory

The Principles of Deep Learning Theory PDF Author: Daniel A. Roberts
Publisher: Cambridge University Press
ISBN: 1316519333
Category : Computers
Languages : en
Pages : 473

Get Book Here

Book Description
This volume develops an effective theory approach to understanding deep neural networks of practical relevance.

Graph Representation Learning

Graph Representation Learning PDF Author: William L. William L. Hamilton
Publisher: Springer Nature
ISBN: 3031015886
Category : Computers
Languages : en
Pages : 141

Get Book Here

Book Description
Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial for creating systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, including techniques for deep graph embeddings, generalizations of convolutional neural networks to graph-structured data, and neural message-passing approaches inspired by belief propagation. These advances in graph representation learning have led to new state-of-the-art results in numerous domains, including chemical synthesis, 3D vision, recommender systems, question answering, and social network analysis. This book provides a synthesis and overview of graph representation learning. It begins with a discussion of the goals of graph representation learning as well as key methodological foundations in graph theory and network analysis. Following this, the book introduces and reviews methods for learning node embeddings, including random-walk-based methods and applications to knowledge graphs. It then provides a technical synthesis and introduction to the highly successful graph neural network (GNN) formalism, which has become a dominant and fast-growing paradigm for deep learning with graph data. The book concludes with a synthesis of recent advancements in deep generative models for graphs—a nascent but quickly growing subset of graph representation learning.

Mathematics for Machine Learning

Mathematics for Machine Learning PDF Author: Marc Peter Deisenroth
Publisher: Cambridge University Press
ISBN: 1108569323
Category : Computers
Languages : en
Pages : 392

Get Book Here

Book Description
The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self-contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site.

Random Matrix Methods for Machine Learning

Random Matrix Methods for Machine Learning PDF Author: Romain Couillet
Publisher: Cambridge University Press
ISBN: 1009301896
Category : Computers
Languages : en
Pages : 412

Get Book Here

Book Description
This book presents a unified theory of random matrices for applications in machine learning, offering a large-dimensional data vision that exploits concentration and universality phenomena. This enables a precise understanding, and possible improvements, of the core mechanisms at play in real-world machine learning algorithms. The book opens with a thorough introduction to the theoretical basics of random matrices, which serves as a support to a wide scope of applications ranging from SVMs, through semi-supervised learning, unsupervised spectral clustering, and graph methods, to neural networks and deep learning. For each application, the authors discuss small- versus large-dimensional intuitions of the problem, followed by a systematic random matrix analysis of the resulting performance and possible improvements. All concepts, applications, and variations are illustrated numerically on synthetic as well as real-world data, with MATLAB and Python code provided on the accompanying website.

Patterns, Predictions, and Actions: Foundations of Machine Learning

Patterns, Predictions, and Actions: Foundations of Machine Learning PDF Author: Moritz Hardt
Publisher: Princeton University Press
ISBN: 0691233721
Category : Computers
Languages : en
Pages : 321

Get Book Here

Book Description
An authoritative, up-to-date graduate textbook on machine learning that highlights its historical context and societal impacts Patterns, Predictions, and Actions introduces graduate students to the essentials of machine learning while offering invaluable perspective on its history and social implications. Beginning with the foundations of decision making, Moritz Hardt and Benjamin Recht explain how representation, optimization, and generalization are the constituents of supervised learning. They go on to provide self-contained discussions of causality, the practice of causal inference, sequential decision making, and reinforcement learning, equipping readers with the concepts and tools they need to assess the consequences that may arise from acting on statistical decisions. Provides a modern introduction to machine learning, showing how data patterns support predictions and consequential actions Pays special attention to societal impacts and fairness in decision making Traces the development of machine learning from its origins to today Features a novel chapter on machine learning benchmarks and datasets Invites readers from all backgrounds, requiring some experience with probability, calculus, and linear algebra An essential textbook for students and a guide for researchers