Statistical Methods for Longitudinal Data Analysis and Reproducible Feature Selection in Human Microbiome Studies

Statistical Methods for Longitudinal Data Analysis and Reproducible Feature Selection in Human Microbiome Studies PDF Author: Lingjing Jiang
Publisher:
ISBN:
Category :
Languages : en
Pages : 101

Get Book Here

Book Description
The microbiome is inherently dynamic, driven by interactions among microbes, with the host, and with the environment. At any point in life, human microbiome can be dramatically altered, either transiently or long term, by diseases, medical interventions or even daily routines. Since the human microbiome is highly dynamic and personalized, longitudinal microbiome studies that sample human-associated microbial communities repeatedly over time provide valuable information for researchers to observe both inter- and intra-individual variability, or to measure changes in response to an intervention in real time. Despite this increasing need in longitudinal data analysis, statistical methods for analyzing sparse longitudinal microbiome data and longitudinal multi-omics data still lag behind. In this dissertation, we describe our efforts in developing two novel statistical methods, Bayesian functional principal components analysis (SFPCA) for sparse longitudinal data analysis, and multivariate sparse functional principal components analysis (mSFPCA) for longitudinal microbiome multi-omics data analysis. Beyond longitudinal data analysis, we are also interested in utilizing statistical techniques for addressing the "reproducibility crisis" in microbiome research, especially in the indispensable task of feature selection. Instead of developing "the best" feature selection method, we focus on discovering a reproducible criterion called Stability for evaluating feature selection methods in order to yield reproducible results in microbiome analysis. To set an appropriate motivation and context for our work, Chapter 1 reviews the importance of longitudinal studies in human microbiome research, and presents the crucial need of developing novel statistical methods to meet the new challenges in longitudinal microbiome data analysis, and of producing reproducible results in microbiome feature selection. Chapter 2 introduces Bayesian SFPCA, a flexible Bayesian approach to SFPCA that enables efficient model selection and graphical model diagnostics for valid longitudinal microbiome applications. Chapter 3 presents mSFPCA, an extension of Bayesian SFPCA from modeling a univariate temporal outcome to simultaneously characterizing multiple temporal measurements, and inferring their temporal associations based on mutual information estimation. Chapter 4 proposes to use reproducibility criterion such as Stability instead of popular model prediction metric such as mean squared error (MSE) to quantify the reproducibility of identified microbial features.

Statistical Methods for Longitudinal Data Analysis and Reproducible Feature Selection in Human Microbiome Studies

Statistical Methods for Longitudinal Data Analysis and Reproducible Feature Selection in Human Microbiome Studies PDF Author: Lingjing Jiang
Publisher:
ISBN:
Category :
Languages : en
Pages : 101

Get Book Here

Book Description
The microbiome is inherently dynamic, driven by interactions among microbes, with the host, and with the environment. At any point in life, human microbiome can be dramatically altered, either transiently or long term, by diseases, medical interventions or even daily routines. Since the human microbiome is highly dynamic and personalized, longitudinal microbiome studies that sample human-associated microbial communities repeatedly over time provide valuable information for researchers to observe both inter- and intra-individual variability, or to measure changes in response to an intervention in real time. Despite this increasing need in longitudinal data analysis, statistical methods for analyzing sparse longitudinal microbiome data and longitudinal multi-omics data still lag behind. In this dissertation, we describe our efforts in developing two novel statistical methods, Bayesian functional principal components analysis (SFPCA) for sparse longitudinal data analysis, and multivariate sparse functional principal components analysis (mSFPCA) for longitudinal microbiome multi-omics data analysis. Beyond longitudinal data analysis, we are also interested in utilizing statistical techniques for addressing the "reproducibility crisis" in microbiome research, especially in the indispensable task of feature selection. Instead of developing "the best" feature selection method, we focus on discovering a reproducible criterion called Stability for evaluating feature selection methods in order to yield reproducible results in microbiome analysis. To set an appropriate motivation and context for our work, Chapter 1 reviews the importance of longitudinal studies in human microbiome research, and presents the crucial need of developing novel statistical methods to meet the new challenges in longitudinal microbiome data analysis, and of producing reproducible results in microbiome feature selection. Chapter 2 introduces Bayesian SFPCA, a flexible Bayesian approach to SFPCA that enables efficient model selection and graphical model diagnostics for valid longitudinal microbiome applications. Chapter 3 presents mSFPCA, an extension of Bayesian SFPCA from modeling a univariate temporal outcome to simultaneously characterizing multiple temporal measurements, and inferring their temporal associations based on mutual information estimation. Chapter 4 proposes to use reproducibility criterion such as Stability instead of popular model prediction metric such as mean squared error (MSE) to quantify the reproducibility of identified microbial features.

Statistical Analysis of Microbiome Data

Statistical Analysis of Microbiome Data PDF Author: Somnath Datta
Publisher: Springer Nature
ISBN: 3030733513
Category : Medical
Languages : en
Pages : 349

Get Book Here

Book Description
Microbiome research has focused on microorganisms that live within the human body and their effects on health. During the last few years, the quantification of microbiome composition in different environments has been facilitated by the advent of high throughput sequencing technologies. The statistical challenges include computational difficulties due to the high volume of data; normalization and quantification of metabolic abundances, relative taxa and bacterial genes; high-dimensionality; multivariate analysis; the inherently compositional nature of the data; and the proper utilization of complementary phylogenetic information. This has resulted in an explosion of statistical approaches aimed at tackling the unique opportunities and challenges presented by microbiome data. This book provides a comprehensive overview of the state of the art in statistical and informatics technologies for microbiome research. In addition to reviewing demonstrably successful cutting-edge methods, particular emphasis is placed on examples in R that rely on available statistical packages for microbiome data. With its wide-ranging approach, the book benefits not only trained statisticians in academia and industry involved in microbiome research, but also other scientists working in microbiomics and in related fields.

Statistical Methods for the Analysis of Microbiome Data

Statistical Methods for the Analysis of Microbiome Data PDF Author: Anna M. Plantinga
Publisher:
ISBN:
Category :
Languages : en
Pages : 128

Get Book Here

Book Description
The human microbiome plays a vital role in maintaining health, and imbalances in the microbiome are associated with a wide variety of diseases. Understanding whether and how the microbiome is associated with particular health conditions is a focus of many modern microbiome studies, with the hope that a deeper understanding of these associations may lead to more effective prevention and treatment regimens. However, how best to analyze data from microbiome profiling studies remains unclear. The high dimensionality, compositional nature, intrinsic biological structure, and limited availability of samples pose substantial statistical challenges. To face these challenges, we propose novel analytic approaches based on sparse penalized regression strategies and distance-based global association analysis. Most distance-based methods for global microbiome association analysis are restricted to simple dichotomous or quantitative outcomes, but more complex outcomes are increasingly common in microbiome studies. In the first part of this dissertation, we introduce two distance-based methods for the analysis of entire microbial communities in modern microbiome studies. We develop a kernel machine regression-based score test for association between the microbiome and censored time-to-event outcomes. We then propose a novel longitudinal measure of dissimilarity that summarizes changes in the microbiome across time and compares these changes between subjects. Since this dissimilarity may be incorporated into any distance-based analysis framework, it is a highly flexible tool for applying a wide variety of distance-based analyses in longitudinal studies. Identification of associated taxa and detection of predictive microbial signatures are key to translation of microbiome studies. In the second part of this dissertation, we present two penalized regression methods for estimation and prediction with high-dimensional compositional data. Because phylogenetic similarity between bacteria often corresponds to shared functions, our first contribution is to incorporate phylogenetic structure into a penalized regression model for constrained data. We then propose a model that exploits phylogenetic structure to use partial information in the setting of differing feature sets between model-building and prediction datasets. We evaluate the performance of these methods through extensive simulation studies and apply them to studies investigating the association of graft-versus-host disease or body mass index with the gut microbiome.

Statistical Methods for Human Microbiome Data Analysis

Statistical Methods for Human Microbiome Data Analysis PDF Author: Jun Chen
Publisher:
ISBN:
Category :
Languages : en
Pages : 107

Get Book Here

Book Description


Adaptive Statistical Methods for Microbiome Association Analysis

Adaptive Statistical Methods for Microbiome Association Analysis PDF Author: Kalins Banerjee
Publisher:
ISBN:
Category :
Languages : en
Pages :

Get Book Here

Book Description
The importance of human microbiome has been increasingly recognized, and substantial research is being conducted focusing on how microbial communities are associated with human health and diseases. These association studies not only can improve our understanding of the non-genetic components of complex traits and diseases, but also might open up an entirely new way of drug development. Here, we introduce two novel tests for microbiome association studies viz.; Adaptive multivariate two-sample test for Microbiome Differential Analysis (AMDA) and Adaptive Microbiome Association Test (AMAT). AMDA addresses microbiome differential abundance analysis, whereas AMAT provides a flexible microbiome association testing platform under the generalized linear model framework. Our research focuses explicitly on adaptive statistical multivariate analysis tools that are developed using data-driven learning approaches to suit a wide range of possible scenarios. Realizing the susceptibility of existing methods to the adverse effects of noise accumulation, the proposed two-stage adaptive testing frameworks incorporate feature selection as an intermediate step. Extensive simulation studies and real data applications demonstrate that both AMDA and AMAT are often more powerful than several competing methods while preserving the correct type I error rate.

Some Topics on Statistical Analysis of Genetic Imprinting Data and Microbiome Compositional Data

Some Topics on Statistical Analysis of Genetic Imprinting Data and Microbiome Compositional Data PDF Author: Fan Xia
Publisher: Open Dissertation Press
ISBN: 9781361355398
Category :
Languages : en
Pages :

Get Book Here

Book Description
This dissertation, "Some Topics on Statistical Analysis of Genetic Imprinting Data and Microbiome Compositional Data" by Fan, Xia, 夏凡, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Genetic association study is a useful tool to identify the genetic component that is responsible for a disease. The phenomenon that a certain gene expresses in a parent-of-origin manner is referred to as genomic imprinting. When a gene is imprinted, the performance of the disease-association study will be affected. This thesis presents statistical testing methods developed specially for nuclear family data centering around the genetic association studies incorporating imprinting effects. For qualitative diseases with binary outcomes, a class of TDTI* type tests was proposed in a general two-stage framework, where the imprinting effects were examined prior to association testing. On quantitative trait loci, a class of Q-TDTI(c) type tests and another class of Q-MAX(c) type tests were proposed. The proposed testing methods flexibly accommodate families with missing parental genotype and with multiple siblings. The performance of all the methods was verified by simulation studies. It was found that the proposed methods improve the testing power for detecting association in the presence of imprinting. The class of TDTI* tests was applied to a rheumatoid arthritis study data. Also, the class of Q-TDTI(c) tests was applied to analyze the Framingham Heart Study data. The human microbiome is the collection of the microbiota, together with their genomes and their habitats throughout the human body. The human microbiome comprises an inalienable part of our genetic landscape and contributes to our metabolic features. Also, current studies have suggested the variety of human microbiome in human diseases. With the high-throughput DNA sequencing, the human microbiome composition can be characterized based on bacterial taxa relative abundance and the phylogenetic constraint. Such taxa data are often high-dimensional overdispersed and contain excessive number of zeros. Taking into account of these characteristics in taxa data, this thesis presents statistical methods to identify associations between covariate/outcome and the human microbiome composition. To assess environmental/biological covariate effect to microbiome composition, an additive logistic normal multinomial regression model was proposed and a group l1 penalized likelihood estimation method was further developed to facilitate selection of covariates and estimation of parameters. To identify microbiome components associated with biological/clinical outcomes, a Bayesian hierarchical regression model with spike and slab prior for variable selection was proposed and a Markov chain Monte Carlo algorithm that combines stochastic variable selection procedure and random walk metropolis-hasting steps was developed for model estimation. Both of the methods were illustrated using simulations as well as a real human gut microbiome dataset from The Penn Gut Microbiome Project. DOI: 10.5353/th_b5223971 Subjects: Genomic imprinting - Statistical methods Body, Human - Microbiology - Statistical methods

Statistical Analysis of Microbiome Data with R

Statistical Analysis of Microbiome Data with R PDF Author: Yinglin Xia
Publisher: Springer
ISBN: 9811315345
Category : Computers
Languages : en
Pages : 505

Get Book Here

Book Description
This unique book addresses the statistical modelling and analysis of microbiome data using cutting-edge R software. It includes real-world data from the authors’ research and from the public domain, and discusses the implementation of R for data analysis step by step. The data and R computer programs are publicly available, allowing readers to replicate the model development and data analysis presented in each chapter, so that these new methods can be readily applied in their own research. The book also discusses recent developments in statistical modelling and data analysis in microbiome research, as well as the latest advances in next-generation sequencing and big data in methodological development and applications. This timely book will greatly benefit all readers involved in microbiome, ecology and microarray data analyses, as well as other fields of research.

Statistical Methods for the Analysis of Genomic Data

Statistical Methods for the Analysis of Genomic Data PDF Author: Hui Jiang
Publisher: MDPI
ISBN: 3039361406
Category : Science
Languages : en
Pages : 136

Get Book Here

Book Description
In recent years, technological breakthroughs have greatly enhanced our ability to understand the complex world of molecular biology. Rapid developments in genomic profiling techniques, such as high-throughput sequencing, have brought new opportunities and challenges to the fields of computational biology and bioinformatics. Furthermore, by combining genomic profiling techniques with other experimental techniques, many powerful approaches (e.g., RNA-Seq, Chips-Seq, single-cell assays, and Hi-C) have been developed in order to help explore complex biological systems. As a result of the increasing availability of genomic datasets, in terms of both volume and variety, the analysis of such data has become a critical challenge as well as a topic of great interest. Therefore, statistical methods that address the problems associated with these newly developed techniques are in high demand. This book includes a number of studies that highlight the state-of-the-art statistical methods for the analysis of genomic data and explore future directions for improvement.

Statistical and Computational Methods for Microbiome Multi-Omics Data

Statistical and Computational Methods for Microbiome Multi-Omics Data PDF Author: Himel Mallick
Publisher: Frontiers Media SA
ISBN: 2889660915
Category : Science
Languages : en
Pages : 170

Get Book Here

Book Description
This eBook is a collection of articles from a Frontiers Research Topic. Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact.

Computational and Statistical Methods for Extracting Biological Signal from High-Dimensional Microbiome Data

Computational and Statistical Methods for Extracting Biological Signal from High-Dimensional Microbiome Data PDF Author: Gibraan Rahman
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Next-generation sequencing (NGS) has effected an explosion of research into the relationship between genetic information and a variety of biological conditions. One of the most exciting areas of study is how the trillions of microbial species that we share this Earth with affect our health. However, the process of extracting useful biological insights from this breadth of data is far from trivial. There are numerous statistical and computational considerations in addition to the already complex and messy biological problems. In this thesis, I describe my work on developing and implementing software to tackle the complex world of statistical microbiome analysis. In the first part of this thesis, we review the applications and challenges of performing dimensionality reduction on microbiome data comprising thousands of microbial taxa. When dealing with this high dimensionality, it is imperative to be able to get an overview of the community structure in a lower dimensional space that can be both visualized and interpreted. We review the statistical considerations for dimensionality reduction and the existing tools and algorithms that can and cannot address them. This includes discussions about sparsity, compositionality, and phylogenetic signal. We also make recommendations about tools and algorithms to consider for different use-cases. In the second part of this thesis, we present a new software, Evident, designed to assist researchers with statistical analysis of microbiome effect sizes and power analysis. Effect sizes of statistical tests are not widely reported in microbiome datasets, limiting the interpretability of community differences such as alpha and beta diversity. As more large microbiome studies are produced, researchers have the opportunity to mine existing datasets to get a sense of the effect size for different biological conditions. These, in turn, can be used to perform power analysis prior to designing an experiment, allowing researchers to better allocate resources. We show how Evident is scalable to dozens of datasets and provides easy calculation and exploration of effect sizes and power analysis from existing data. In the third part of this thesis, we describe a novel investigation into the joint microbiome and metabolome axis in colorectal cancer. In most cases of sporadic colorectal cancers (CRC), tumorigenesis is a multistep process driven by genomic alterations in concert with dietary influences. In addition, mounting evidence has implicated the gut microbiome as an effector in the development and progression of CRC. While large meta-analyses have provided mechanistic insight into disease progression in CRC patients, study heterogeneity has limited causal associations. To address this limitation, multi-omics studies on genetically controlled cohorts of mice were performed to distinguish genetic and dietary influences. Diet was identified as the major driver of microbial and metabolomic differences, with reductions in alpha diversity and widespread changes in cecal metabolites seen in HFD-fed mice. Similarly, the levels of non-classic amino acid conjugated forms of the bile acid cholic acid (AA-CAs) increased with HFD. We show that these AA-CAs signal through the nuclear receptor FXR and membrane receptor TGR5 to functionally impact intestinal stem cell growth. In addition, the poor intestinal permeability of these AA-CAs supports their localization in the gut. Moreover, two cryptic microbial strains, Ileibacterium valens and Ruminococcus gnavus, were shown to have the capacity to synthesize these AA-CAs. This multi-omics dataset from CRC mouse models supports diet-induced shifts in the microbiome and metabolome in disease progression with potential utility in directing future diagnostic and therapeutic developments. In the fourth chapter, we demonstrate a new framework for performing differential abundance analysis using customized statistical modeling. As we learn more and more about the relationship between the microbiome and biological conditions, experimental protocols are becoming more and more complex. For example, meta-analyses, interventions, longitudinal studies, etc. are being used to better understand the dynamic nature of the microbiome. However, statistical methods to analyze these relationships are lacking--especially in the field of differential abundance. Finding biomarkers associated with conditions of interest must be performed with statistical care when dealing with these kinds of experimental designs. We present BIRDMAn, a software package integrating probabilistic programming with Stan to build custom models for analyzing microbiome data. We show that, on both simulated and real datasets, BIRDMAn is able to extract novel biological signals that are missed by existing methods. These chapters, taken together, advance our knowledge of statistical analysis of microbiome data and provide tools and references for researchers looking to perform analysis on their own data.