Essays on Tree-based Methods for Prediction and Causal Inference

Essays on Tree-based Methods for Prediction and Causal Inference PDF Author: Eoghan O'Neill
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description

Essays on Tree-based Methods for Prediction and Causal Inference

Essays on Tree-based Methods for Prediction and Causal Inference PDF Author: Eoghan O'Neill
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Elements of Causal Inference

Elements of Causal Inference PDF Author: Jonas Peters
Publisher: MIT Press
ISBN: 0262037319
Category : Computers
Languages : en
Pages : 289

Get Book Here

Book Description
A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning. The mathematization of causality is a relatively recent development, and has become increasingly important in data science and machine learning. This book offers a self-contained and concise introduction to causal models and how to learn them from data. After explaining the need for causal models and discussing some of the principles underlying causal inference, the book teaches readers how to use causal models: how to compute intervention distributions, how to infer causal models from observational and interventional data, and how causal ideas could be exploited for classical machine learning problems. All of these topics are discussed first in terms of two variables and then in the more general multivariate case. The bivariate case turns out to be a particularly hard problem for causal learning because there are no conditional independences as used by classical methods for solving multivariate cases. The authors consider analyzing statistical asymmetries between cause and effect to be highly instructive, and they report on their decade of intensive research into this problem. The book is accessible to readers with a background in machine learning or statistics, and can be used in graduate courses or as a reference for researchers. The text includes code snippets that can be copied and pasted, exercises, and an appendix with a summary of the most important technical concepts.

Causal Inference with Random Forests

Causal Inference with Random Forests PDF Author: Stefan Wager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Random forests, introduced by Breiman [2001], have become one of the most popular machine learning algorithms among practitioners, and reliably achieve good predictive performance across several application areas. This has led to considerable interest in using random forests for doing science, or drawing statistical inferences in problems that do not reduce immediately to prediction. As a step in this direction, this thesis studies how random forests can be used for understanding treatment effect heterogeneity as it may arise in, e.g., personalized medicine. Our main contributions are as follows: - We develop a causal forest algorithm for heterogeneous treatment effect estimation, and find our method to be substantially more powerful at identifying treatment heterogeneity than traditional methods based on nearest-neighbor matching, especially when the number of considered covariates is large. - We provide an asymptotic statistical analysis of causal forests, and prove a Gaussian limit result. We then propose a practical method for estimating the noise scale of causal forests, thus allowing for valid statistical inference with causal forests. - In a high-dimensional regime where the problem complexity and the number of observations jointly approach infinity, we identify the signal strength at which tree-based methods become able to accurately detect treatment heterogeneity. Perhaps strikingly, we find that the required signal strength only scales logarithmically in the dimension of the problem. Taken together, these results show that random forests -- despite often being understood as a mere black box predictive algorithm -- provide a powerful toolbox for heterogeneous treatment effect estimation in modern large-scale problems.

Causation, Prediction, and Search

Causation, Prediction, and Search PDF Author: Peter Spirtes
Publisher: Springer Science & Business Media
ISBN: 1461227488
Category : Mathematics
Languages : en
Pages : 551

Get Book Here

Book Description
This book is intended for anyone, regardless of discipline, who is interested in the use of statistical methods to help obtain scientific explanations or to predict the outcomes of actions, experiments or policies. Much of G. Udny Yule's work illustrates a vision of statistics whose goal is to investigate when and how causal influences may be reliably inferred, and their comparative strengths estimated, from statistical samples. Yule's enterprise has been largely replaced by Ronald Fisher's conception, in which there is a fundamental cleavage between experimental and non experimental inquiry, and statistics is largely unable to aid in causal inference without randomized experimental trials. Every now and then members of the statistical community express misgivings about this turn of events, and, in our view, rightly so. Our work represents a return to something like Yule's conception of the enterprise of theoretical statistics and its potential practical benefits. If intellectual history in the 20th century had gone otherwise, there might have been a discipline to which our work belongs. As it happens, there is not. We develop material that belongs to statistics, to computer science, and to philosophy; the combination may not be entirely satisfactory for specialists in any of these subjects. We hope it is nonetheless satisfactory for its purpose.

Achieving Reliable Causal Inference with Data-Mined Variables

Achieving Reliable Causal Inference with Data-Mined Variables PDF Author: Mochen Yang
Publisher:
ISBN:
Category :
Languages : en
Pages : 53

Get Book Here

Book Description
Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to "mine" variables of interest from available data, followed by the inclusion of those variables into an econometric framework, with the objective of estimating causal effects. Recent work highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables are likely to suffer from bias due to measurement error. We propose a novel approach to mitigate these biases, leveraging the ensemble learning technique known as the random forest. We propose employing random forest not just for prediction, but also for generating instrumental variables to address the measurement error embedded in the prediction. The random forest algorithm performs best when comprised of a set of trees that are individually accurate in their predictions, yet which also make "different" mistakes, i.e., have weakly correlated prediction errors. A key observation is that these properties are closely related to the relevance and exclusion requirements of valid instrumental variables. We design a data-driven procedure to select tuples of individual trees from a random forest, in which one tree serves as the endogenous covariate and the other trees serve as its instruments. Simulation experiments demonstrate the efficacy of the proposed approach in mitigating estimation biases, and its superior performance over an alternative method (simulation-extrapolation), which has been suggested by prior work as a reasonable method of addressing the measurement error problem.

The Forest Or the Trees? Tackling Simpson's Paradox with Classi Fication and Regression Trees

The Forest Or the Trees? Tackling Simpson's Paradox with Classi Fication and Regression Trees PDF Author: Galit Shmueli
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Prediction and variable selection are major uses of data mining algorithms but they are rarely the focus in social science research, where the main objective is causal explanation. Ideal causal modeling is based on randomized experiments, but because experiments are often impossible, unethical or expensive to perform, social science research often relies on observational data for studying causality. A major challenge is to infer causality from such data. This paper uses the predictive tool of Classification and Regression Trees for detecting Simpson's paradox, which is related to causal inference. We introduce a new tree approach for detecting potential paradoxes in data that have either a few or a large number of potential confounding variables. The approach relies on the tree structure and the location of the cause vs. the confounders in the tree. We discuss theoretical and computational aspects of the approach and illustrate it using several real applications.

Causality

Causality PDF Author: Judea Pearl
Publisher: Cambridge University Press
ISBN: 052189560X
Category : Computers
Languages : en
Pages : 487

Get Book Here

Book Description
Causality offers the first comprehensive coverage of causal analysis in many sciences, including recent advances using graphical methods. Pearl presents a unified account of the probabilistic, manipulative, counterfactual and structural approaches to causation, and devises simple mathematical tools for analyzing the relationships between causal connections, statistical associations, actions and observations. The book will open the way for including causal analysis in the standard curriculum of statistics, artificial intelligence ...

An Introduction to Causal Inference

An Introduction to Causal Inference PDF Author: Judea Pearl
Publisher: Createspace Independent Publishing Platform
ISBN: 9781507894293
Category : Causation
Languages : en
Pages : 0

Get Book Here

Book Description
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation. -- p. 1.

Essays on Methods for Causal Inference

Essays on Methods for Causal Inference PDF Author: Patrick F. Burauel
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description


Causal Inference for High-Stakes Decisions

Causal Inference for High-Stakes Decisions PDF Author: Harsh J. Parikh
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Causal inference methods are commonly used across domains to aid high-stakes decision-making. The validity of causal studies often relies on strong assumptions that might not be realistic in high-stakes scenarios. Inferences based on incorrect assumptions frequently result in sub-optimal decisions with high penalties and long-term consequences. Unlike prediction or machine learning methods, it is particularly challenging to evaluate the performance of causal methods using just the observed data because the ground truth causal effects are missing for all units. My research presents frameworks to enable validation of causal inference methods in one of the following three ways: (i) auditing the estimation procedure by a domain expert, (ii) studying the performance using synthetic data, and (iii) using placebo tests to identify biases. This work enables decision-makers to reason about the validity of the estimation procedure by thinking carefully about the underlying assumptions. Our Learning-to-Match framework is an auditable-and-accurate approach that learns an optimal distance metric for estimating heterogeneous treatment effects. We augment Learning-to-Match framework with pharmacological mechanistic knowledge to study the long-term effects of untreated seizure-like brain activities in critically ill patients. Here, the auditability of the estimator allowed neurologists to qualitatively validate the analysis via a chart-review. We also propose Credence, a synthetic data based framework to validate causal inference methods. Credence simulates data that is stochastically indistinguishable from the observed data while allowing for user-designed treatment effects and selection biases. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications. We also discuss an approach to combines experimental and observational studies. Our approach provides a principled approach to test for the violations of key assumptions and estimate causal effects (Chapter 5).