Estimation and Conditional Inference in High-dimensional Statistical Models

Estimation and Conditional Inference in High-dimensional Statistical Models PDF Author: Arend L. Voorman
Publisher:
ISBN:
Category :
Languages : en
Pages : 117

Get Book Here

Book Description
In many areas of biology, recent advances in technology have facilitated the measurement of large numbers of features, while the number of observations in a data set may remain relatively modest. In this setting, lasso regression and related procedures have been extensively studied for prediction, while the problem of inference is relatively less studied. Most inference in high dimensions is based on simple marginal associations between variables. However, a richer characterization of the associations between variables can be obtained by examining conditional relationships, which account for the joint behavior of the variables. Inference on conditional relationships is more difficult, because it requires one to specify how features are related to one another, to estimate these relationships, and to characterize the uncertainty in the estimation procedure. In Chapters 2 and 3, we explore a few methods for testing hypotheses about conditional relationships in the high-dimensional setting. In Chapter 4, we note some strong distributional assumptions implicit in many treatments of high-dimensional graphical models, and propose a modification which treats this issue.

Estimation and Conditional Inference in High-dimensional Statistical Models

Estimation and Conditional Inference in High-dimensional Statistical Models PDF Author: Arend L. Voorman
Publisher:
ISBN:
Category :
Languages : en
Pages : 117

Get Book Here

Book Description
In many areas of biology, recent advances in technology have facilitated the measurement of large numbers of features, while the number of observations in a data set may remain relatively modest. In this setting, lasso regression and related procedures have been extensively studied for prediction, while the problem of inference is relatively less studied. Most inference in high dimensions is based on simple marginal associations between variables. However, a richer characterization of the associations between variables can be obtained by examining conditional relationships, which account for the joint behavior of the variables. Inference on conditional relationships is more difficult, because it requires one to specify how features are related to one another, to estimate these relationships, and to characterize the uncertainty in the estimation procedure. In Chapters 2 and 3, we explore a few methods for testing hypotheses about conditional relationships in the high-dimensional setting. In Chapter 4, we note some strong distributional assumptions implicit in many treatments of high-dimensional graphical models, and propose a modification which treats this issue.

Sparse Graphical Modeling for High Dimensional Data

Sparse Graphical Modeling for High Dimensional Data PDF Author: Faming Liang
Publisher: CRC Press
ISBN: 0429584806
Category : Mathematics
Languages : en
Pages : 151

Get Book Here

Book Description
A general framework for learning sparse graphical models with conditional independence tests Complete treatments for different types of data, Gaussian, Poisson, multinomial, and mixed data Unified treatments for data integration, network comparison, and covariate adjustment Unified treatments for missing data and heterogeneous data Efficient methods for joint estimation of multiple graphical models Effective methods of high-dimensional variable selection Effective methods of high-dimensional inference

Partially Linear Models

Partially Linear Models PDF Author: Wolfgang Härdle
Publisher: Springer Science & Business Media
ISBN: 3642577008
Category : Mathematics
Languages : en
Pages : 210

Get Book Here

Book Description
In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.

Methods for Estimation and Inference for High-dimensional Models

Methods for Estimation and Inference for High-dimensional Models PDF Author: Lina Lin
Publisher:
ISBN:
Category :
Languages : en
Pages : 166

Get Book Here

Book Description
This thesis tackles three different problems in high-dimensional statistics. The first two parts of the thesis focus on estimation of sparse high-dimensional undirected graphical models under non-standard conditions, specifically, non-Gaussianity and missingness, when observations are continuous. To address estimation under non-Gaussianity, we propose a general framework involving augmenting the score matching losses introduced in Hyva ̈rinen [2005, 2007] with an l1-regularizing penalty. This method, which we refer to as regularized score matching, allows for computationally efficient treatment of Gaussian and non-Gaussian continuous exponential family models because the considered loss becomes a penalized quadratic and thus yields piecewise linear solution paths. Under suitable irrepresentability conditions and distributional assumptions, we show that regularized score matching generates consistent graph estimates in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of- the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models. To address estimation of sparse high-dimensional undirected graphical models with missing observations, we propose adapting the regularized score matching framework by substituting in surrogates of relevant statistics to accommodate these circumstances, as in Loh and Wainwright [2012] and Kolar and Xing [2012]. For Gaussian and non-Gaussian continuous exponential family models, the use of these surrogates may result in a loss of semi-definiteness, and thus nonconvexity, in the objective. Nevertheless, under suitable distributional assumptions, the global optimum is close to the truth in matrix l1 norm with high probability in sparse high-dimensional settings. Furthermore, under the same set of assumptions, we show that the composite gradient descent algorithm we propose for minimizing the modified objective converges at a geometric rate to a solution close to the global optimum with high probability. The last part of the thesis moves away from undirected graphical models, and is instead concerned with inference in high-dimensional regression models. Specifically, we investigate how to construct asymptotically valid confidence intervals and p-values for the fixed effects in a high-dimensional linear mixed effect model. The framework we propose, largely founded on a recent work [Bu ̈hlmann, 2013], entails de-biasing a ‘naive’ ridge estimator. We show via numerical experiments that the method controls for Type I error in hypothesis testing and generates confidence intervals that achieve target coverage, outperforming competitors that assume observations are homogeneous when observations are, in fact, correlated within group.

Inference and Estimation in High-dimensional Data Analysis

Inference and Estimation in High-dimensional Data Analysis PDF Author: Adel Javanmard
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
Modern technologies generate vast amounts of fine-grained data at an unprecedented speed. Nowadays, high-dimensional data, where the number of variables is much larger than the sample size, occur in many applications, such as healthcare, social networks, and recommendation systems, among others. The ubiquitous interest in these applications has spurred remarkable progress in the area of high-dimensional data analysis in terms of point estimation and computation. However, one of the fundamental inference task, namely quantifying uncertainty or assessing statistical significance, is still in its infancy for such models. In the first part of this dissertation, we present efficient procedures and corresponding theory for constructing classical uncertainty measures like confidence intervals and p-values for single regression coefficients in high-dimensional settings. In the second part, we study the compressed sensing reconstruction problem, a well-known example of estimation in high-dimensional settings. We propose a new approach to this problem that is drastically different from the classical wisdom in this area. Our construction of the sensing matrix is inspired by the idea of spatial coupling in coding theory and similar ideas in statistical physics. For reconstruction, we use an approximate message passing algorithm. This is an iterative algorithm that takes advantage of the statistical properties of the problem to improve convergence rate. Finally, we prove that our method can effectively solve the reconstruction problem at (information-theoretically) optimal undersampling rate and show its robustness to measurement noise.

Introduction to High-Dimensional Statistics

Introduction to High-Dimensional Statistics PDF Author: Christophe Giraud
Publisher: CRC Press
ISBN: 1000408353
Category : Computers
Languages : en
Pages : 410

Get Book Here

Book Description
Praise for the first edition: "[This book] succeeds singularly at providing a structured introduction to this active field of research. ... it is arguably the most accessible overview yet published of the mathematical ideas and principles that one needs to master to enter the field of high-dimensional statistics. ... recommended to anyone interested in the main results of current research in high-dimensional statistics as well as anyone interested in acquiring the core mathematical skills to enter this area of research." —Journal of the American Statistical Association Introduction to High-Dimensional Statistics, Second Edition preserves the philosophy of the first edition: to be a concise guide for students and researchers discovering the area and interested in the mathematics involved. The main concepts and ideas are presented in simple settings, avoiding thereby unessential technicalities. High-dimensional statistics is a fast-evolving field, and much progress has been made on a large variety of topics, providing new insights and methods. Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this new edition: Offers revised chapters from the previous edition, with the inclusion of many additional materials on some important topics, including compress sensing, estimation with convex constraints, the slope estimator, simultaneously low-rank and row-sparse linear regression, or aggregation of a continuous set of estimators. Introduces three new chapters on iterative algorithms, clustering, and minimax lower bounds. Provides enhanced appendices, minimax lower-bounds mainly with the addition of the Davis-Kahan perturbation bound and of two simple versions of the Hanson-Wright concentration inequality. Covers cutting-edge statistical methods including model selection, sparsity and the Lasso, iterative hard thresholding, aggregation, support vector machines, and learning theory. Provides detailed exercises at the end of every chapter with collaborative solutions on a wiki site. Illustrates concepts with simple but clear practical examples.

Mathematical Foundations of Infinite-Dimensional Statistical Models

Mathematical Foundations of Infinite-Dimensional Statistical Models PDF Author: Evarist Giné
Publisher: Cambridge University Press
ISBN: 1009022784
Category : Mathematics
Languages : en
Pages : 706

Get Book Here

Book Description
In nonparametric and high-dimensional statistical models, the classical Gauss–Fisher–Le Cam theory of the optimality of maximum likelihood estimators and Bayesian posterior inference does not apply, and new foundations and ideas have been developed in the past several decades. This book gives a coherent account of the statistical theory in infinite-dimensional parameter spaces. The mathematical foundations include self-contained 'mini-courses' on the theory of Gaussian and empirical processes, approximation and wavelet theory, and the basic theory of function spaces. The theory of statistical inference in such models - hypothesis testing, estimation and confidence sets - is presented within the minimax paradigm of decision theory. This includes the basic theory of convolution kernel and projection estimation, but also Bayesian nonparametrics and nonparametric maximum likelihood estimation. In a final chapter the theory of adaptive inference in nonparametric models is developed, including Lepski's method, wavelet thresholding, and adaptive inference for self-similar functions. Winner of the 2017 PROSE Award for Mathematics.

Advances in Complex Data Modeling and Computational Methods in Statistics

Advances in Complex Data Modeling and Computational Methods in Statistics PDF Author: Anna Maria Paganoni
Publisher: Springer
ISBN: 3319111493
Category : Mathematics
Languages : en
Pages : 210

Get Book Here

Book Description
The book is addressed to statisticians working at the forefront of the statistical analysis of complex and high dimensional data and offers a wide variety of statistical models, computer intensive methods and applications: network inference from the analysis of high dimensional data; new developments for bootstrapping complex data; regression analysis for measuring the downsize reputational risk; statistical methods for research on the human genome dynamics; inference in non-euclidean settings and for shape data; Bayesian methods for reliability and the analysis of complex data; methodological issues in using administrative data for clinical and epidemiological research; regression models with differential regularization; geostatistical methods for mobility analysis through mobile phone data exploration. This volume is the result of a careful selection among the contributions presented at the conference "S.Co.2013: Complex data modeling and computationally intensive methods for estimation and prediction" held at the Politecnico di Milano, 2013. All the papers published here have been rigorously peer-reviewed.

Estimation and Statistical Inference for Network Structures

Estimation and Statistical Inference for Network Structures PDF Author: Thien Minh Le
Publisher:
ISBN:
Category : Electronic dissertations
Languages : en
Pages : 125

Get Book Here

Book Description
Understanding the connective nature between variables from high-dimensional data sets is essential to gain insights about the complex interaction mechanisms of these variables. These insights are important in applications such as neuroscience or genetics with the hope to improve treatments of serious diseases. In statistical modeling, the connectivity between variables is often modeled by the conditional dependence between variables via Gaussian graphical models. The dependence information provided by Gaussian graphical structures (network structures) is paramount for improving statistical estimation precision. This dissertation addresses the problem of testing the goodness-of-t of a pre-specified network structure in high-dimensional settings with the dimension of nodes more than the sample size. We propose a new test statistic and derive the asymptotic distribution under mild conditions. Besides developing a new testing procedure for assessing a pre-specified network structure, we also introduce a general tuning parameter selection procedure for choosing the regularization parameters in regularized methods for estimating network structures. Finally, we propose an estimator for the precision matrix when its graphical structure is known. We also study the asymptotic properties of the proposed estimator under the high-dimensional framework.

Statistical Foundations of Data Science

Statistical Foundations of Data Science PDF Author: Jianqing Fan
Publisher: CRC Press
ISBN: 0429527616
Category : Mathematics
Languages : en
Pages : 942

Get Book Here

Book Description
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.