Evaluating and Understanding Adversarial Robustness in Deep Learning

Evaluating and Understanding Adversarial Robustness in Deep Learning PDF Author: Jinghui Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 175

Get Book Here

Book Description
Deep Neural Networks (DNNs) have made many breakthroughs in different areas of artificial intelligence. However, recent studies show that DNNs are vulnerable to adversarial examples. A tiny perturbation on an image that is almost invisible to human eyes could mislead a well-trained image classifier towards misclassification. This raises serious security concerns and trustworthy issues towards the robustness of Deep Neural Networks in solving real world challenges. Researchers have been working on this problem for a while and it has further led to a vigorous arms race between heuristic defenses that propose ways to defend against existing attacks and newly-devised attacks that are able to penetrate such defenses. While the arm race continues, it becomes more and more crucial to accurately evaluate model robustness effectively and efficiently under different threat models and identify those ``falsely'' robust models that may give us a false sense of robustness. On the other hand, despite the fast development of various kinds of heuristic defenses, their practical robustness is still far from satisfactory, and there are actually little algorithmic improvements in terms of defenses during recent years. This suggests that there still lacks further understandings toward the fundamentals of adversarial robustness in deep learning, which might prevent us from designing more powerful defenses. \\The overarching goal of this research is to enable accurate evaluations of model robustness under different practical settings as well as to establish a deeper understanding towards other factors in the machine learning training pipeline that might affect model robustness. Specifically, we develop efficient and effective Frank-Wolfe attack algorithms under white-box and black-box settings and a hard-label adversarial attack, RayS, which is capable of detecting ``falsely'' robust models. In terms of understanding adversarial robustness, we propose to theoretically study the relationship between model robustness and data distributions, the relationship between model robustness and model architectures, as well as the relationship between model robustness and loss smoothness. The techniques proposed in this dissertation form a line of researches that deepens our understandings towards adversarial robustness and could further guide us in designing better and faster robust training methods.

Evaluating and Understanding Adversarial Robustness in Deep Learning

Evaluating and Understanding Adversarial Robustness in Deep Learning PDF Author: Jinghui Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 175

Get Book Here

Book Description
Deep Neural Networks (DNNs) have made many breakthroughs in different areas of artificial intelligence. However, recent studies show that DNNs are vulnerable to adversarial examples. A tiny perturbation on an image that is almost invisible to human eyes could mislead a well-trained image classifier towards misclassification. This raises serious security concerns and trustworthy issues towards the robustness of Deep Neural Networks in solving real world challenges. Researchers have been working on this problem for a while and it has further led to a vigorous arms race between heuristic defenses that propose ways to defend against existing attacks and newly-devised attacks that are able to penetrate such defenses. While the arm race continues, it becomes more and more crucial to accurately evaluate model robustness effectively and efficiently under different threat models and identify those ``falsely'' robust models that may give us a false sense of robustness. On the other hand, despite the fast development of various kinds of heuristic defenses, their practical robustness is still far from satisfactory, and there are actually little algorithmic improvements in terms of defenses during recent years. This suggests that there still lacks further understandings toward the fundamentals of adversarial robustness in deep learning, which might prevent us from designing more powerful defenses. \\The overarching goal of this research is to enable accurate evaluations of model robustness under different practical settings as well as to establish a deeper understanding towards other factors in the machine learning training pipeline that might affect model robustness. Specifically, we develop efficient and effective Frank-Wolfe attack algorithms under white-box and black-box settings and a hard-label adversarial attack, RayS, which is capable of detecting ``falsely'' robust models. In terms of understanding adversarial robustness, we propose to theoretically study the relationship between model robustness and data distributions, the relationship between model robustness and model architectures, as well as the relationship between model robustness and loss smoothness. The techniques proposed in this dissertation form a line of researches that deepens our understandings towards adversarial robustness and could further guide us in designing better and faster robust training methods.

Advances in Reliably Evaluating and Improving Adversarial Robustness

Advances in Reliably Evaluating and Improving Adversarial Robustness PDF Author: Jonas Rauber
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Machine learning has made enormous progress in the last five to ten years. We can now make a computer, a machine, learn complex perceptual tasks from data rather than explicitly programming it. When we compare modern speech or image recognition systems to those from a decade ago, the advances are awe-inspiring. The susceptibility of machine learning systems to small, maliciously crafted adversarial perturbations is less impressive. Almost imperceptible pixel shifts or background noises can completely derail their performance. While humans are often amused by the stupidity of artificial intelligence, engineers worry about the security and safety of their machine learning applications, and scientists wonder how to make machine learning models more robust and more human-like. This dissertation summarizes and discusses advances in three areas of adversarial robustness. First, we introduce a new type of adversarial attack against machine learning models in real-world black-box scenarios. Unlike previous attacks, it does not require any insider knowledge or special access. Our results demonstrate the concrete threat caused by the current lack of robustness in machine learning applications. Second, we present several contributions to deal with the diverse challenges around evaluating adversarial robustness. The most fundamental challenge is that common attacks cannot distinguish robust models from models with misleading gradients. We help uncover and solve this problem through two new types of attacks immune to gradient masking. Misaligned incentives are another reason for insufficient evaluations. We published joint guidelines and organized an interactive competition to mitigate this problem. Finally, our open-source adversarial attacks library Foolbox empowers countless researchers to overcome common technical obstacles. Since robustness evaluations are inherently unstandardized, straightforward access to various attacks is more than a technical convenience; it promotes thorough evaluations. Third, we showcase a fundamentally new neural network architecture for robust classification. It uses a generative analysis-by-synthesis approach. We demonstrate its robustness using a digit recognition task and simultaneously reveal the limitations of prior work that uses adversarial training. Moreover, further studies have shown that our model best predicts human judgments on so-called controversial stimuli and that our approach scales to more complex datasets.

Adversarial Robustness for Machine Learning

Adversarial Robustness for Machine Learning PDF Author: Pin-Yu Chen
Publisher: Academic Press
ISBN: 0128242574
Category : Computers
Languages : en
Pages : 300

Get Book Here

Book Description
Adversarial Robustness for Machine Learning summarizes the recent progress on this topic and introduces popular algorithms on adversarial attack, defense and veri?cation. Sections cover adversarial attack, veri?cation and defense, mainly focusing on image classi?cation applications which are the standard benchmark considered in the adversarial robustness community. Other sections discuss adversarial examples beyond image classification, other threat models beyond testing time attack, and applications on adversarial robustness. For researchers, this book provides a thorough literature review that summarizes latest progress in the area, which can be a good reference for conducting future research. In addition, the book can also be used as a textbook for graduate courses on adversarial robustness or trustworthy machine learning. While machine learning (ML) algorithms have achieved remarkable performance in many applications, recent studies have demonstrated their lack of robustness against adversarial disturbance. The lack of robustness brings security concerns in ML models for real applications such as self-driving cars, robotics controls and healthcare systems. Summarizes the whole field of adversarial robustness for Machine learning models Provides a clearly explained, self-contained reference Introduces formulations, algorithms and intuitions Includes applications based on adversarial robustness

Evaluating and Certifying the Adversarial Robustness of Neural Language Models

Evaluating and Certifying the Adversarial Robustness of Neural Language Models PDF Author: Muchao Ye
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Language models (LMs) built by deep neural networks (DNNs) have achieved great success in various areas of artificial intelligence, which have played an increasingly vital role in profound applications including chatbots and smart healthcare. Nonetheless, the vulnerability of DNNs against adversarial examples still threatens the application of neural LMs to safety-critical tasks. To specify, DNNs will change their correct predictions into incorrect ones when small perturbations are added to the original input texts. In this dissertation, we identify key challenges in evaluating and certifying the adversarial robustness of neural LMs and bridge those gaps through efficient hard-label text adversarial attacks and a unified certified robust training framework. The first step of developing neural LMs with high adversarial robustness is evaluating whether they are empirically robust against perturbed texts. The vital technique related to that is the text adversarial attack, which aims to construct a text that can fool LMs. Ideally, it shall output high-quality adversarial examples in a realistic setting with high efficiency. However, current evaluation pipelines proposed in the realistic hard-label setting adopt heuristic search methods, consequently meeting an inefficiency problem. To tackle this limitation, we introduce a series of hard-label text adversarial attack methods, which successfully tackle the inefficiency problem by using a pretrained word embedding space as an intermediate. A deep dive into this idea illustrates that utilizing an estimated decision boundary in the introduced word embedding space helps improve the quality of crafted adversarial examples. The ultimate goal of constructing robust neural LMs is obtaining ones for which adversarial examples do not exist, which can be realized through certified robust training. The research community has proposed different types of certified robust training either in the discrete input space or in the continuous latent feature space. We discover the structural gap within current pipelines and unify them in the word embedding space. By removing unnecessary bound computation modules, i.e., interval bound propagation, and harnessing a new decoupled regularization learning paradigm, our unification can provide a stronger robustness guarantee. Given the aforementioned contributions, we believe our findings will help contribute to the development of robust neural LMs.

Improved Methodology for Evaluating Adversarial Robustness in Deep Neural Networks

Improved Methodology for Evaluating Adversarial Robustness in Deep Neural Networks PDF Author: Kyungmi Lee (S. M.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 93

Get Book Here

Book Description
Deep neural networks are known to be vulnerable to adversarial perturbations, which are often imperceptible to humans but can alter predictions of machine learning systems. Since the exact value of adversarial robustness is difficult to obtain for complex deep neural networks, accuracy of the models against perturbed examples generated by attack methods is empirically used as a proxy to adversarial robustness. However, failure of attack methods to find adversarial perturbations cannot be equated with being robust. In this work, we identify three common cases that lead to overestimation of accuracy against perturbed examples generated by bounded first-order attack methods: 1) the value of cross-entropy loss numerically becoming zero when using standard floating point representation, resulting in non-useful gradients; 2) innately non-differentiable functions in deep neural networks, such as Rectified Linear Unit (ReLU) activation and MaxPool operation, incurring “gradient masking” [2]; and 3) certain regularization methods used during training inducing the model to be less amenable to first-order approximation. We show that these phenomena exist in a wide range of deep neural networks, and that these phenomena are not limited to specific defense methods they have been previously investigated for. For each case, we propose compensation methods that either address sources of inaccurate gradient computation, such as numerical saturation for near zero values and non-differentiability, or reduce the total number of back-propagations for iterative attacks by approximating second-order information. These compensation methods can be combined with existing attack methods for a more precise empirical evaluation metric. We illustrate the impact of these three phenomena with examples of practical interest, such as benchmarking model capacity and regularization techniques for robustness. Furthermore, we show that the gap between adversarial accuracy and the guaranteed lower bound of robustness can be partially explained by these phenomena. Overall, our work shows that overestimated adversarial accuracy that is not indicative of robustness is prevalent even for conventionally trained deep neural networks, and highlights cautions of using empirical evaluation without guaranteed bounds.

On the Robustness of Neural Network: Attacks and Defenses

On the Robustness of Neural Network: Attacks and Defenses PDF Author: Minhao Cheng
Publisher:
ISBN:
Category :
Languages : en
Pages : 158

Get Book Here

Book Description
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples. That is, a slightly modified example could be easily generated and fool a well-trained image classifier based on deep neural networks (DNNs) with high confidence. This makes it difficult to apply neural networks in security-critical areas. To find such examples, we first introduce and define adversarial examples. In the first part, we then discuss how to build adversarial attacks in both image and discrete domains. For image classification, we introduce how to design an adversarial attacker in three different settings. Among them, we focus on the most practical setup for evaluating the adversarial robustness of a machine learning system with limited access: the hard-label black-box attack setting for generating adversarial examples, where limited model queries are allowed and only the decision is provided to a queried data input. For the discrete domain, we first talk about its difficulty and introduce how to conduct the adversarial attack on two applications. While crafting adversarial examples is an important technique to evaluate the robustness of DNNs, there is a huge need for improving the model robustness as well. Enhancing model robustness under new and even adversarial environments is a crucial milestone toward building trustworthy machine learning systems. In the second part, we talk about the methods to strengthen the model's adversarial robustness. We first discuss attack-dependent defense. Specifically, we first discuss one of the most effective methods for improving the robustness of neural networks: adversarial training and its limitations. We introduce a variant to overcome its problem. Then we take a different perspective and introduce attack-independent defense. We summarize the current methods and introduce a framework-based vicinal risk minimization. Inspired by the framework, we introduce self-progressing robust training. Furthermore, we discuss the robustness trade-off problem and introduce a hypothesis and propose a new method to alleviate it.

Adversarial Machine Learning

Adversarial Machine Learning PDF Author: Aneesh Sreevallabh Chivukula
Publisher: Springer Nature
ISBN: 3030997723
Category : Computers
Languages : en
Pages : 316

Get Book Here

Book Description
A critical challenge in deep learning is the vulnerability of deep learning networks to security attacks from intelligent cyber adversaries. Even innocuous perturbations to the training data can be used to manipulate the behaviour of deep networks in unintended ways. In this book, we review the latest developments in adversarial attack technologies in computer vision; natural language processing; and cybersecurity with regard to multidimensional, textual and image data, sequence data, and temporal data. In turn, we assess the robustness properties of deep learning networks to produce a taxonomy of adversarial examples that characterises the security of learning systems using game theoretical adversarial deep learning algorithms. The state-of-the-art in adversarial perturbation-based privacy protection mechanisms is also reviewed. We propose new adversary types for game theoretical objectives in non-stationary computational learning environments. Proper quantification of the hypothesis set in the decision problems of our research leads to various functional problems, oracular problems, sampling tasks, and optimization problems. We also address the defence mechanisms currently available for deep learning models deployed in real-world environments. The learning theories used in these defence mechanisms concern data representations, feature manipulations, misclassifications costs, sensitivity landscapes, distributional robustness, and complexity classes of the adversarial deep learning algorithms and their applications. In closing, we propose future research directions in adversarial deep learning applications for resilient learning system design and review formalized learning assumptions concerning the attack surfaces and robustness characteristics of artificial intelligence applications so as to deconstruct the contemporary adversarial deep learning designs. Given its scope, the book will be of interest to Adversarial Machine Learning practitioners and Adversarial Artificial Intelligence researchers whose work involves the design and application of Adversarial Deep Learning.

Towards Adversarial Robustness of Feed-forward and Recurrent Neural Networks

Towards Adversarial Robustness of Feed-forward and Recurrent Neural Networks PDF Author: Qinglong Wang
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
"Recent years witnessed the successful resurgence of neural networks through the lens of deep learning research. As the spread of deep neural network (DNN) continues to reach multifarious branches of research, including computer vision, natural language processing, and malware detection, it has been found that the vulnerability of these powerful models is equally impressive as their capability in classification tasks. Specifically, research on the adversarial example problem exposes that DNNs, albeit powerful when confronted with legitimate samples, suffer severely from adversarial examples. These synthetic examples can be created by slightly modifying legitimate samples. We speculate that this vulnerability may significantly impede an extensive adoption of DNNs in safety-critical domains. This thesis aims to comprehend some of the mysteries of this vulnerability of DNN, design generic frameworks and deployable algorithms to protect DNNs with different architectures from attacks armed with adversarial examples. We first conduct a thorough exploration of existing research on explaining the pervasiveness of adversarial examples. We unify the hypotheses raised in existing work by extracting three major influencing factors, i.e., data, model, and training. These factors are also helpful in locating different attack and defense methods proposed in the research spectrum and analyzing their effectiveness and limitations. Then we perform two threads of research on neural networks with feed-forward and recurrent architectures, respectively. In the first thread, we focus on the adversarial robustness of feed-forward neural networks, which have been widely applied to process images. Under our proposed generic framework, we design two types of adversary resistant feed-forward networks that weaken the destructive power of adversarial examples and even prevent their creation. We theoretically validate the effectiveness of our methods and empirically demonstrate that they significantly boost a DNN's adversarial robustness while maintaining high accuracy in classification. Our second thread of study focuses on the adversarial robustness of the recurrent neural network (RNN), which represents a variety of networks typically used for processing sequential data. We develop an evaluation framework and propose to quantitatively evaluate RNN's adversarial robustness with deterministic finite automata (DFA), which represent rigorous rules and can be extracted from RNNs, and a distance metric suitable for strings. We demonstrate the feasibility of using extracted DFA as rules through conducting careful experimental studies to identify key conditions that affect the extraction performance. Moreover, we theoretically establish the correspondence between different RNNs and different DFA, and empirically validate the correspondence by evaluating and comparing different RNNs for their extraction performance. At last, we develop an algorithm under our framework and conduct a case study to evaluate the adversarial robustness of different RNNs on a set of regular grammars"--

Attacks, Defenses and Testing for Deep Learning

Attacks, Defenses and Testing for Deep Learning PDF Author: Jinyin Chen
Publisher: Springer Nature
ISBN: 9819704251
Category :
Languages : en
Pages : 413

Get Book Here

Book Description


Adversarial Robustness of Deep Learning Models

Adversarial Robustness of Deep Learning Models PDF Author: Samarth Gupta (S.M.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 80

Get Book Here

Book Description
Efficient operation and control of modern day urban systems such as transportation networks is now more important than ever due to huge societal benefits. Low cost network-wide sensors generate large amounts of data which needs to processed to extract useful information necessary for operational maintenance and to perform real-time control. Modern Machine Learning (ML) systems, particularly Deep Neural Networks (DNNs), provide a scalable solution to the problem of information retrieval from sensor data. Therefore, Deep Learning systems are increasingly playing an important role in day-to-day operations of our urban systems and hence cannot not be treated as standalone systems anymore. This naturally raises questions from a security viewpoint. Are modern ML systems robust to adversarial attacks for deployment in critical real-world applications? If not, then how can we make progress in securing these systems against such attacks? In this thesis we first demonstrate the vulnerability of modern ML systems on a real world scenario relevant to transportation networks by successfully attacking a commercial ML platform using a traffic-camera image. We review different methods of defense and various challenges associated in training an adversarially robust classifier. In terms of contributions, we propose and investigate a new method of defense to build adversarially robust classifiers using Error-Correcting Codes (ECCs). The idea of using Error-Correcting Codes for multi-class classification has been investigated in the past but only under nominal settings. We build upon this idea in the context of adversarial robustness of Deep Neural Networks. Following the guidelines of code-book design from literature, we formulate a discrete optimization problem to generate codebooks in a systematic manner. This optimization problem maximizes minimum hamming distance between codewords of the codebook while maintaining high column separation. Using the optimal solution of the discrete optimization problem as our codebook, we then build a (robust) multi-class classifier from that codebook. To estimate the adversarial accuracy of ECC based classifiers resulting from different codebooks, we provide methods to generate gradient based white-box attacks. We discuss estimation of class probability estimates (or scores) which are in itself useful for real-world applications along with their use in generating black-box and white-box attacks. We also discuss differentiable decoding methods, which can also be used to generate white-box attacks. We are able to outperform standard all-pairs codebook, providing evidence to the fact that compact codebooks generated using our discrete optimization approach can indeed provide high performance. Most importantly, we show that ECC based classifiers can be partially robust even without any adversarial training. We also show that this robustness is simply not a manifestation of the large network capacity of the overall classifier. Our approach can be seen as the first step towards designing classifiers which are robust by design. These contributions suggest that ECCs based approach can be useful to improve the robustness of modern ML systems and thus making urban systems more resilient to adversarial attacks.