Heterogeneous Treatment Effect Estimation in Observational Studies Using Tree-based Methods

Heterogeneous Treatment Effect Estimation in Observational Studies Using Tree-based Methods PDF Author: Yuyang Zhang
Publisher:
ISBN:
Category : Biometry
Languages : en
Pages : 167

Get Book Here

Book Description
Observational studies provide a rich source of data for evaluating causal relationships. Appropriate statistical methods for causal inference should be developed to account for the non-randomized nature of observational studies. Matching design is commonly used to deal with this non-randomized issue as it is robust to the model misspecification. To goal of this work is to use the matching design to perform causal inference in population and subpopulation. Propensity score is a powerful tool for adjusting observed confounding bias when there are a large number of confounders. Relatively few studies have focused on whether the post-matching analysis should adjust for the matching structure when estimate the population treatment effect. In the first part of the thesis, we compare results under different strategies with and without the matching design for both continuous outcome and binary outcome and discuss whether the post-matching should take into account when the treatment effect is homogeneous. \cite{zhang2020accounting} However, treatment effects are likely to be different across different subpopulations, especially in a real-world problem. We then propose a non-parametric matching tree (MT) to tackle both confounding adjustment and subgroup identification at the same time by combining the machine learning methods with matching designs. We prove that it produces unbiased subpopulation treatment effect estimators. To evaluate the performance of the proposed method, we run extensive simulation studies to compare it with popular tree-based causal inference methods. We apply the proposed method to examine the impact of Tobramycin for the patients' first pseudomonas aeruginosa chronic infection in Cystic Fibrosis disease in the U.S. We finally discuss limitations and potential future works.

Heterogeneous Treatment Effect Estimation in Observational Studies Using Tree-based Methods

Heterogeneous Treatment Effect Estimation in Observational Studies Using Tree-based Methods PDF Author: Yuyang Zhang
Publisher:
ISBN:
Category : Biometry
Languages : en
Pages : 167

Get Book Here

Book Description
Observational studies provide a rich source of data for evaluating causal relationships. Appropriate statistical methods for causal inference should be developed to account for the non-randomized nature of observational studies. Matching design is commonly used to deal with this non-randomized issue as it is robust to the model misspecification. To goal of this work is to use the matching design to perform causal inference in population and subpopulation. Propensity score is a powerful tool for adjusting observed confounding bias when there are a large number of confounders. Relatively few studies have focused on whether the post-matching analysis should adjust for the matching structure when estimate the population treatment effect. In the first part of the thesis, we compare results under different strategies with and without the matching design for both continuous outcome and binary outcome and discuss whether the post-matching should take into account when the treatment effect is homogeneous. \cite{zhang2020accounting} However, treatment effects are likely to be different across different subpopulations, especially in a real-world problem. We then propose a non-parametric matching tree (MT) to tackle both confounding adjustment and subgroup identification at the same time by combining the machine learning methods with matching designs. We prove that it produces unbiased subpopulation treatment effect estimators. To evaluate the performance of the proposed method, we run extensive simulation studies to compare it with popular tree-based causal inference methods. We apply the proposed method to examine the impact of Tobramycin for the patients' first pseudomonas aeruginosa chronic infection in Cystic Fibrosis disease in the U.S. We finally discuss limitations and potential future works.

An Instrumental Variable Tree Approach for Detecting Heterogeneous Treatment Effects in Observational Studies

An Instrumental Variable Tree Approach for Detecting Heterogeneous Treatment Effects in Observational Studies PDF Author: Guihua Wang
Publisher:
ISBN:
Category :
Languages : en
Pages : 24

Get Book Here

Book Description
We develop a technique that incorporates the instrumental variable method into a causal tree to correct for potential endogeneity biases in heterogeneous treatment effect analysis using observational studies. The resulting instrumental variable tree approach partitions subjects into subgroups with similar treatment effects within subgroups and different treatment effects across subgroups. The estimated treatment effects are asymptotically consistent under very general assumptions. Using simulated data, we show that our approach has better coverage rates and smaller mean-squared errors than the conventional causal tree, and that a forest constructed using instrumental variable trees has better accuracy and interpretability than the generalized random forest.

Statistical Methods for Studying Heterogeneous Treatment Effects with Instrumental Variables

Statistical Methods for Studying Heterogeneous Treatment Effects with Instrumental Variables PDF Author: Michael William Johnson
Publisher:
ISBN:
Category :
Languages : en
Pages : 130

Get Book Here

Book Description
There is a growing interest in estimating heterogeneous treatment effects in randomized and observational studies. However, most of the work relies on the assumption of ignorability, or no unmeasured confounding on the treatment effect. While instrumental variables (IV) are a popular technique to control for unmeasured confounding, there has been little research conducted to study heterogeneous treatment effects with the use of an IV. This dissertation introduces methods using an IV to discover novel subgroups, estimate their heterogeneous treatment effects, and identify individualized treatment rules (ITR) when ignorability is expected to be violated. In Chapter 2, we present a two-part algorithm to estimate heterogeneous treatment effects and detect novel subgroups using an IV with matching. The first part uses interpretable machine learning techniques, such as classification and regression trees, to discover potential effect modifiers. The second part uses closed testing to test for statistical significance of each effect modifier while strongly controlling the familywise error rate. We apply this method on the Oregon Health Insurance Experiment, estimating the effect of Medicaid on the number of days an individual's health does not impede their usual activities by using a randomized lottery as an instrument. In Chapter 3, we generalize methods to identify ITR using a binary IV to using multiple, discrete valued instruments, or equivalently, multilevel instruments. Several new problems arise when generalizing to multilevel instruments, requiring novel solutions. In particular, multilevel IV give rise to many latent subgroups that may experience heterogeneous treatment effects. Additionally, it may be unclear how to combine and compare the different levels of the IV to estimate treatment heterogeneity. We provide methods that use a prediction of the latent subgroup to identify optimal ITR, and methods to dynamically combine levels of the multilevel IV to estimate the heterogeneous treatment effects, effectively individualizing estimation of an ITR. Further, we provide and discuss necessary and sufficient conditions to identify an optimal ITR using a multilevel IV. We apply our methods to identify an ITR for two competing treatments, carotid endarterectomy and carotid artery stenting, on preventing stroke or death within 30 days of their index procedure.

Handbook of Causal Analysis for Social Research

Handbook of Causal Analysis for Social Research PDF Author: Stephen L. Morgan
Publisher: Springer Science & Business Media
ISBN: 9400760949
Category : Social Science
Languages : en
Pages : 423

Get Book Here

Book Description
What constitutes a causal explanation, and must an explanation be causal? What warrants a causal inference, as opposed to a descriptive regularity? What techniques are available to detect when causal effects are present, and when can these techniques be used to identify the relative importance of these effects? What complications do the interactions of individuals create for these techniques? When can mixed methods of analysis be used to deepen causal accounts? Must causal claims include generative mechanisms, and how effective are empirical methods designed to discover them? The Handbook of Causal Analysis for Social Research tackles these questions with nineteen chapters from leading scholars in sociology, statistics, public health, computer science, and human development.

Targeted Learning in Data Science

Targeted Learning in Data Science PDF Author: Mark J. van der Laan
Publisher: Springer
ISBN: 3319653040
Category : Mathematics
Languages : en
Pages : 655

Get Book Here

Book Description
This textbook for graduate students in statistics, data science, and public health deals with the practical challenges that come with big, complex, and dynamic data. It presents a scientific roadmap to translate real-world data science applications into formal statistical estimation problems by using the general template of targeted maximum likelihood estimators. These targeted machine learning algorithms estimate quantities of interest while still providing valid inference. Targeted learning methods within data science area critical component for solving scientific problems in the modern age. The techniques can answer complex questions including optimal rules for assigning treatment based on longitudinal data with time-dependent confounding, as well as other estimands in dependent data structures, such as networks. Included in Targeted Learning in Data Science are demonstrations with soft ware packages and real data sets that present a case that targeted learning is crucial for the next generation of statisticians and data scientists. Th is book is a sequel to the first textbook on machine learning for causal inference, Targeted Learning, published in 2011. Mark van der Laan, PhD, is Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at UC Berkeley. His research interests include statistical methods in genomics, survival analysis, censored data, machine learning, semiparametric models, causal inference, and targeted learning. Dr. van der Laan received the 2004 Mortimer Spiegelman Award, the 2005 Van Dantzig Award, the 2005 COPSS Snedecor Award, the 2005 COPSS Presidential Award, and has graduated over 40 PhD students in biostatistics and statistics. Sherri Rose, PhD, is Associate Professor of Health Care Policy (Biostatistics) at Harvard Medical School. Her work is centered on developing and integrating innovative statistical approaches to advance human health. Dr. Rose’s methodological research focuses on nonparametric machine learning for causal inference and prediction. She co-leads the Health Policy Data Science Lab and currently serves as an associate editor for the Journal of the American Statistical Association and Biostatistics.

Causal Inference with Random Forests

Causal Inference with Random Forests PDF Author: Stefan Wager
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Random forests, introduced by Breiman [2001], have become one of the most popular machine learning algorithms among practitioners, and reliably achieve good predictive performance across several application areas. This has led to considerable interest in using random forests for doing science, or drawing statistical inferences in problems that do not reduce immediately to prediction. As a step in this direction, this thesis studies how random forests can be used for understanding treatment effect heterogeneity as it may arise in, e.g., personalized medicine. Our main contributions are as follows: - We develop a causal forest algorithm for heterogeneous treatment effect estimation, and find our method to be substantially more powerful at identifying treatment heterogeneity than traditional methods based on nearest-neighbor matching, especially when the number of considered covariates is large. - We provide an asymptotic statistical analysis of causal forests, and prove a Gaussian limit result. We then propose a practical method for estimating the noise scale of causal forests, thus allowing for valid statistical inference with causal forests. - In a high-dimensional regime where the problem complexity and the number of observations jointly approach infinity, we identify the signal strength at which tree-based methods become able to accurately detect treatment heterogeneity. Perhaps strikingly, we find that the required signal strength only scales logarithmically in the dimension of the problem. Taken together, these results show that random forests -- despite often being understood as a mere black box predictive algorithm -- provide a powerful toolbox for heterogeneous treatment effect estimation in modern large-scale problems.

Essays on Treatment Effect Estimation and Treatment Choice Learning

Essays on Treatment Effect Estimation and Treatment Choice Learning PDF Author: Liqiang Shi
Publisher:
ISBN:
Category :
Languages : en
Pages : 119

Get Book Here

Book Description
This dissertation consists of three chapters that study treatment effect estimation and treatment choice learning under the potential outcome framework (Neyman, 1923; Rubin, 1974). The first two chapters study how to efficiently combine an experimental sample with an auxiliary observational sample when estimating treatment effects. In chapter 1, I derive a new semiparametric efficiency bound under the two-sample setup for estimating ATE and other functions of the average potential outcomes. The efficiency bound for estimating ATE with an experimental sample alone is derived in Hahn (1998) and has since become an important reference point for studies that aim at improving the ATE estimation. This chapter answers how an auxiliary sample containing only observable characteristics (covariates, or features) can lower this efficiency bound. The newly obtained bound has an intuitive expression and shows that the (maximum possible) amount of variance reduction depends positively on two factors: 1) the size of the auxiliary sample, and 2) how well the covariates predict the individual treatment effect. The latter naturally motivates having high dimensional covariates and the adoption of modern machine learning methods to avoid over-fitting. In chapter 2, under the same setup, I propose a two-stage machine learning (ML) imputation estimator that achieves the efficiency bound derived in chapter 1, so that no other regular estimators for ATE can have lower asymptotic variance in the same setting. This estimator involves two steps. In the first step, conditional average potential outcome functions are estimated nonparametrically via ML, which are then used to impute the unobserved potential outcomes for every unit in both samples. In the second step, the imputed potential outcomes are aggregated together in a robust way to produce the final estimate. Adopting the cross-fitting technique proposed in Chernozhukov et al. (2018), our two-step estimator can use a wide range of supervised ML tools in its first step, while maintaining valid inference to construct confidence intervals and perform hypothesis tests. In fact, any method that estimates the relevant conditional mean functions consistently in square norm, with no rate requirement, will lead to efficiency through the proposed two-step procedure. I also show that cross-fitting is not necessary when the first step is implemented via LASSO or post-LASSO. Furthermore, our estimator is robust in the sense that it remains consistent and root n normal (no longer efficient) even if the first step estimators are inconsistent. Chapter 3 (coauthored with Kirill Ponomarev) studies model selection in treatment choice learning. When treatment effects are heterogeneous, a decision maker, given either experiment or quasi-experiment data, can attempt to find a policy function that maps observable characteristics to treatment choices, aiming at maximizing utilitarian welfare. When doing so, one often has to pick a constrained class of functions as candidates for the policy function. The choice of this function class poses a model selection problem. Following Mbakop and Tabord-Meehan (2021) we propose a policy learning algorithm that incorporates data-driven model selection. Our method also leverages doubly robust estimation (Athey and Wager, 2021) so that it could retain the optimal root n rate in expected regret in general setups including quasi-experiments where propensity scores are unknown. We also refined some related results in the literature and derived a new finite sample lower bound on expected regret to show that the root n rate is indeed optimal.

Targeted Learning

Targeted Learning PDF Author: Mark J. van der Laan
Publisher: Springer Science & Business Media
ISBN: 1441997822
Category : Mathematics
Languages : en
Pages : 628

Get Book Here

Book Description
The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows (1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and (2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest. This book is aimed at both statisticians and applied researchers interested in causal inference and general effect estimation for observational and experimental data. Part I is an accessible introduction to super learning and the targeted maximum likelihood estimator, including related concepts necessary to understand and apply these methods. Parts II-IX handle complex data structures and topics applied researchers will immediately recognize from their own research, including time-to-event outcomes, direct and indirect effects, positivity violations, case-control studies, censored data, longitudinal data, and genomic studies.

The Economics of Artificial Intelligence

The Economics of Artificial Intelligence PDF Author: Ajay Agrawal
Publisher: University of Chicago Press
ISBN: 0226833127
Category : Business & Economics
Languages : en
Pages : 172

Get Book Here

Book Description
A timely investigation of the potential economic effects, both realized and unrealized, of artificial intelligence within the United States healthcare system. In sweeping conversations about the impact of artificial intelligence on many sectors of the economy, healthcare has received relatively little attention. Yet it seems unlikely that an industry that represents nearly one-fifth of the economy could escape the efficiency and cost-driven disruptions of AI. The Economics of Artificial Intelligence: Health Care Challenges brings together contributions from health economists, physicians, philosophers, and scholars in law, public health, and machine learning to identify the primary barriers to entry of AI in the healthcare sector. Across original papers and in wide-ranging responses, the contributors analyze barriers of four types: incentives, management, data availability, and regulation. They also suggest that AI has the potential to improve outcomes and lower costs. Understanding both the benefits of and barriers to AI adoption is essential for designing policies that will affect the evolution of the healthcare system.

Robust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching

Robust Interval Estimation of a Treatment Effect in Observational Studies Using Propensity Score Matching PDF Author: Scott F. Kosten
Publisher:
ISBN:
Category : Statistics
Languages : en
Pages : 236

Get Book Here

Book Description
Estimating the treatment effect between a treatment group and a control group in an observational study is a challenging problem in statistics. Without random assignment of subjects, there are likely to be differences between the treatment group and control group on a set of baseline covariates. If one of these baseline covariates is correlated to the response variable, then the difference in sample means between the groups is likely to be a biased estimate of the true treatment effect. Propensity score matching has become an increasingly popular strategy for reducing bias in estimates of the treatment effect. This reduction in bias is accomplished by identifying a subset of the original control group, which is similar to the treatment group in terms of the measured baseline covariates. Our research focused on the development of a new procedure that combines propensity score matching and a rank-based analysis of the general linear model. Our procedure was compared to several others in a Monte Carlo simulation study. Overall, our procedure produced highly efficient and robust confidence intervals for a treatment effect in an observational study. In addition to the Monte Carlo simulation study, our procedure and several other propensity score matching techniques were used to analyze two real world datasets for the presence of a treatment effect.