Scalable Methods for in Situ Genomics

Scalable Methods for in Situ Genomics PDF Author: Andrew Colin Payne
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
We conclude with a discussion of IGS scaling properties, by which we can anticipate many-fold future improvements in yield and resolution. We anticipate IGS and related scalable in situ methods will be instrumental in unifying genomics and microscopy, enabling scientists to map genome organization from single base pairs to whole organisms and ultimately to connect genome structure and function.

Scalable Methods for in Situ Genomics

Scalable Methods for in Situ Genomics PDF Author: Andrew Colin Payne
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
We conclude with a discussion of IGS scaling properties, by which we can anticipate many-fold future improvements in yield and resolution. We anticipate IGS and related scalable in situ methods will be instrumental in unifying genomics and microscopy, enabling scientists to map genome organization from single base pairs to whole organisms and ultimately to connect genome structure and function.

Scalable Methods for Genome Assembly

Scalable Methods for Genome Assembly PDF Author: Priyanka Ghosh
Publisher:
ISBN:
Category :
Languages : en
Pages : 155

Get Book Here

Book Description
De novo genome assembly is a fundamental problem in the field of computational biology. The goal is to reconstruct an unknown genome from short DNA fragments (called "reads") obtained from it. Over the last decade, with the advent of numerous next-generation sequencing (NGS) platforms (e.g., Illumina, 454 Roche), billions of reads can be generated in a matter of hours, leading to vast amounts of data accumulation per day. This has necessitated efficient parallelization of the assembly process to meet the growing data demands. While multiple parallel solutions to the problem have been proposed in the past, there still exists a gap in terms of the processing power between massively parallel NGS technologies and the ability of current state-of-the-art assemblers to analyze and assemble large and complex genomes. Conducting genome assembly at scale remains a challenge owing to the intense computational and memory requirements of the problem, coupled with inherent complexities in existing parallel tools associated with data movement, use of complex data structures, unstructured memory accesses and repeated I/O operations. In this dissertation, we address the challenges of conducting genome assembly at scale and develop new methods for conducting extreme-scale genome assembly for microbial and complex eukaryotic genomes. Our approach to the problem is two-fold, wherein we make the following contributions: i) FastEtch- a new method targeting fast and space-efficient assemblies, using probabilistic data structures (Count-Min sketch) that executes efficiently on shared-memory platforms with a minimal computational footprint (both memory and time). ii) PaKman- a fully distributed method that tackles assembly of large genomes through the combination of a novel data-structure (PaK-Graph) and algorithmic strategies to simplify the communication and I/O footprint during the assembly process. We present an extensive performance and qualitative evaluation of both our algorithms including comparisons to other state-of-the-art methods. Our results demonstrate that FastEtch can yield one of the best time-memory-quality trade-offs, when compared against many state-of-the-art genome assemblers. PaKman has shown the ability to achieve near-linear speedups on up to 8K cores; outperform state-of-the-art distributed and shared memory tools in performance while delivering comparable (if not better) quality; and reduce time to solution significantly.

Scalable Parallel Algorithms for Genome Analysis

Scalable Parallel Algorithms for Genome Analysis PDF Author: Evangelos Georganas
Publisher:
ISBN:
Category :
Languages : en
Pages : 129

Get Book Here

Book Description
A critical problem for computational genomics is the problem of de novo genome assembly: the development of robust scalable methods for transforming short randomly sampled "shotgun" sequences, namely reads, into the contiguous and accurate reconstruction of complex genomes. These reads are significantly shorter (e.g. hundreds of bases long) than the size of chromosomes and also include errors. While advanced methods exist for assembling the small and haploid genomes of prokaryotes, the genomes of eukaryotes are more complex. Moreover, de novo assembly has been unable to keep pace with the flood of data, due to the dramatic increases in genome sequencer capabilities, combined with the computational requirements and the algorithmic complexity of assembling large scale genomes and metagenomes. In this dissertation, we address this challenge head on by developing parallel algorithms for de novo genome assembly with the ambition to scale to massive concurrencies. Our work is based on the Meraculous assembler, a state-of-the-art de novo assembler for short reads developed at JGI. Meraculous identifies non-erroneous overlapping substrings of length k (k-mers) with high quality extensions and uniquely assembles genome regions into uncontested sequences called contigs by constructing and traversing a de Bruijn graph of k-mers, a special graph that is used to represent overlaps among k-mers. The original reads are subsequently aligned onto the contigs to obtain information regarding the relative orientation of the contigs. Contigs are then linked together to create scaffolds, sequences of contigs that may contain gaps among them. Finally gaps are filled using localized assemblies based on the original reads. First, we design efficient scalable algorithms for k-mer analysis and contig generation. K-mer analysis is characterized by intensive communication and I/O requirements and our parallel algorithms successfully reduce the memory requirements by 7×. Then, contig generation relies on efficient parallelization of the de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We present a novel algorithm that leverages one-sided communication capabilities of the UPC to facilitate the requisite fine-grained, irregular parallelism and the avoidance of data hazards. The sequence alignment is characterized by intensive I/O and large computation requirements. We introduce mer-Aligner, a highly parallel sequence aligner that employs parallelism in all of its components. Finally, this thesis details the parallelization of the scaffolding modules, enabling the first massively scalable, high quality, complete end-to-end de novo assembly pipeline. Experimental large-scale results using human and wheat genomes demonstrate efficient performance and scalability on thousands of cores. Compared to the original Meraculous code, which requires approximately 48 hours to assemble the human genome, our pipeline called HipMer computes the assembly in only 4 minutes using 23,040 cores of Edison - an overall speedup of approximately 720×. In the last part of the dissertation we tackle the problem of metagenome assembly. Metagenomics is currently the leading technology to study the uncultured microbial diversity. While accessing an unprecedented number of environmental samples that consist of thousands of individual microbial genomes is now possible, the bottleneck is becoming computational, since the sequencing cost improvements exceed that of Moore's Law. Metagenome assembly is further complicated by repeated sequences across genomes, polymorphisms within a species and variable frequency of the genomes within the sample. In our work we repurpose HipMer components for the problem of metagenome assembly and we design a versatile, high-performance metagenome assembly pipeline that outperforms state-of-the-art tools in both quality and performance.

Scalable Algorithms for Analysis of Genomic Diversity Data

Scalable Algorithms for Analysis of Genomic Diversity Data PDF Author: Bogdan Pașaniuc
Publisher:
ISBN:
Category :
Languages : en
Pages : 196

Get Book Here

Book Description


Scaling Single Cell Genomics Analysis to Millions of Cells

Scaling Single Cell Genomics Analysis to Millions of Cells PDF Author: Benjamin Ezra Parks
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Improved experimental methods in single cell genomics have increased dataset sizes by two orders of magnitude in the last five years, such that software scalability is quickly becoming a key bottleneck in our ability to analyze and understand multi-million cell atlases. Most analysis methods load full datasets in memory, resulting in excessive memory usage that scales one-to-one with dataset size. Furthermore, existing compressed storage formats for single cell datasets are so slow to read that practical analysis must be performed on uncompressed data. This work describes BPCells, a software package for scalable analysis of massive single cell RNA-seq and ATAC-seq datasets. BPCells provides lossless, seekable bitpacking compression for scATAC-seq fragment alignments and sparse single cell counts matrices. These compression formats are so fast that a single thread can decompress a dataset faster than loading an uncompressed version from a hard drive. Additionally, BPCells implements disk-backed streaming computations that can reduce memory requirements by two orders of magnitude compared to popular tools like Scanpy and Seurat, while incurring little or no speed penalty. Notably, BPCells can reproduce the results of existing software packages to within numerical precision, making it a drop-in replacement for existing tools. This work covers the design and implementation of BPCells, along with applications of single cell analysis.

Distance-aware Algorithms for Scalable Evolutionary and Ecological Analyses

Distance-aware Algorithms for Scalable Evolutionary and Ecological Analyses PDF Author: Metin Balaban
Publisher:
ISBN:
Category :
Languages : en
Pages : 0

Get Book Here

Book Description
Thanks to the advances in sequencing technologies in the last two decades, the set of available whole-genome sequences has been expanding rapidly. One of the challenges in phylogenetics is accurate large-scale phylogenetic inference based on whole-genome sequences. A related challenge is using incomplete genome-wide data in an assembly-free manner for accurate sample identification with reference to phylogeny. This dissertation proposes new scalable and accurate algorithms to address these two challenges. First, I present a family of scalable methods called TreeCluster for breaking a large set of sequences into evolutionary homogeneous clusters. Second, I present two algorithms for accurate phylogenetic placement of genomic sequences on ultra-large single-gene and whole-genome based trees. The first version, APPLES, scales linearly with the reference size while APPLES-2 scales sub-linearly thanks to a divide-and-conquer strategy based on the TreeCluster method. Third, I develop a solution for assembly-free sample phylogenetic placement for a particularly challenging case when the specimen is a mixture of two cohabiting species or a hybrid of two species. Fourth, I address one limitation of assembly-free methods--their reliance on simple models of sequence evolution--by developing a technique to compute evolutionary distances under a complex 4-parameter model called TK4. Finally, I introduce a divide-and-conquer workflow for incrementally growing and updating ultra-large phylogenies using many of the ingredients developed in other chapters. This workflow (uDance) is accurate in simulations and can build a 200,000-genome microbial tree-of-life based on 388 marker genes.

The Mouse Nervous System

The Mouse Nervous System PDF Author: Charles Watson
Publisher: Academic Press
ISBN: 0123694973
Category : Science
Languages : en
Pages : 815

Get Book Here

Book Description
The Mouse Nervous System provides a comprehensive account of the central nervous system of the mouse. The book is aimed at molecular biologists who need a book that introduces them to the anatomy of the mouse brain and spinal cord, but also takes them into the relevant details of development and organization of the area they have chosen to study. The Mouse Nervous System offers a wealth of new information for experienced anatomists who work on mice. The book serves as a valuable resource for researchers and graduate students in neuroscience. Systematic consideration of the anatomy and connections of all regions of the brain and spinal cord by the authors of the most cited rodent brain atlases A major section (12 chapters) on functional systems related to motor control, sensation, and behavioral and emotional states A detailed analysis of gene expression during development of the forebrain by Luis Puelles, the leading researcher in this area Full coverage of the role of gene expression during development and the new field of genetic neuroanatomy using site-specific recombinases Examples of the use of mouse models in the study of neurological illness

Metagenomics

Metagenomics PDF Author: Muniyandi Nagarajan
Publisher: Academic Press
ISBN: 0128134038
Category : Medical
Languages : en
Pages : 400

Get Book Here

Book Description
Metagenomics: Perspectives, Methods, and Applications provides thorough coverage of the growing field of metagenomics. A diverse range of chapters from international experts offer an introduction to the field and examine methods for metagenomic analysis of microbiota, metagenomic computational tools, and recent metagenomic studies in various environments. The emphasis on application makes this text particularly useful for applied researchers, practitioners, clinicians and students seeking to employ metagenomic approaches to advance knowledge in the biomedical and life sciences. Case-study based application chapters examine topics ranging from viral metagenome profiling, metagenomics in oral disease and health, metagenomic insights into the human gut microbiome and metabolic syndromes, and more. Additionally, perspectives on future potential at the end of each chapter provoke new thought and motivations for continued study in this exciting and fruitful research area. Provides thorough coverage of the rapidly growing field of metagenomics, with an emphasis on applications of relevance to translational researchers, practitioners, clinicians and students Features a diverse range of chapters from international experts that offer an introduction to the field and examine methods for metagenomic analysis of microbiota, metagenomic computational tools and research pipelines Highlights perspectives on future potential at the end of each chapter to provoke new thought and motivations for continued study in this exciting and fruitful research area

Molecular Diagnostics

Molecular Diagnostics PDF Author: George P. Patrinos
Publisher: Academic Press
ISBN: 0128029889
Category : Medical
Languages : en
Pages : 526

Get Book Here

Book Description
Molecular Diagnostics, Third Edition, focuses on the technologies and applications that professionals need to work in, develop, and manage a clinical diagnostic laboratory. Each chapter contains an expert introduction to each subject that is next to technical details and many applications for molecular genetic testing that can be found in comprehensive reference lists at the end of each chapter. Contents are divided into three parts, technologies, application of those technologies, and related issues. The first part is dedicated to the battery of the most widely used molecular pathology techniques. New chapters have been added, including the various new technologies involved in next-generation sequencing (mutation detection, gene expression, etc.), mass spectrometry, and protein-specific methodologies. All revised chapters have been completely updated, to include not only technology innovations, but also novel diagnostic applications. As with previous editions, each of the chapters in this section includes a brief description of the technique followed by examples from the area of expertise from the selected contributor. The second part of the book attempts to integrate previously analyzed technologies into the different aspects of molecular diagnostics, such as identification of genetically modified organisms, stem cells, pharmacogenomics, modern forensic science, molecular microbiology, and genetic diagnosis. Part three focuses on various everyday issues in a diagnostic laboratory, from genetic counseling and related ethical and psychological issues, to safety and quality management. Presents a comprehensive account of all new technologies and applications used in clinical diagnostic laboratories Explores a wide range of molecular-based tests that are available to assess DNA variation and changes in gene expression Offers clear translational presentations by the top molecular pathologists, clinical chemists, and molecular geneticists in the field

Bioinformatics of Genome Regulation, Volume I, 2nd Edition

Bioinformatics of Genome Regulation, Volume I, 2nd Edition PDF Author: Yuriy L. Orlov
Publisher: Frontiers Media SA
ISBN: 2889741427
Category : Science
Languages : en
Pages : 234

Get Book Here

Book Description
Publisher’s note: In this 2nd edition, the following article has been updated: Orlov YL, Tatarinova TV, Oparina NY, Galieva ER and Baranova AV (2021) Editorial: Bioinformatics of Genome Regulation, Volume I. Front. Genet. 12:803273. doi: 10.3389/fgene.2021.803273