Statistical Genomics, Bioinformatics, Systems Biology
Statistical Genomics, Network Modeling, Network Visualization, Environmental Health
Microglia, Cell Imaging
Statistical Genomics, Cell Mixtures
Interactive Graphics, Network Modeling
Bioinformatics, Systems Biology
Medical Genomics, Precision Medicine
Statistical Genomics, Co-expression, Deconvolution
Statistical Genomics, Non-detects in qPCR, Missing data
Autoregressive models, Amplification dynamics in qPCR
Bioinformatics, Cell Mixtures
Interactive graphics, Data science
Distributed systems, Serverless architecture
Statistical Genomics, Cell Mixtures
Variable cellular composition of tissue samples represents a significant challenge for the interpretation of genomic profiling studies. Substantial effort has been devoted to modeling and adjusting for compositional differences when estimating differential expression between sample types. However, relatively little attention has been given to the effect of tissue composition on co-expression estimates. In this study, we illustrate the effect of variable cell-type composition on correlation-based network estimation and provide a mathematical decomposition of the tissue-level correlation. We show that a class of deconvolution methods developed to separate tumor and stromal signatures can be applied to two component cell-type mixtures. In simulated and real data, we identify conditions in which a deconvolution approach would be beneficial. Our results suggest that uncorrelated cell-type-specific markers are ideally suited to deconvolute both the expression and co-expression patterns of an individual cell type. We provide a Shiny application for users to interactively explore the effect of cell-type composition on correlation-based co-expression estimation for any cell types of interest.
MicroRNAs are short RNAs that serve as regulators of gene expression and are essential components of normal development as well as modulators of disease. MicroRNAs generally act cell-autonomously, and thus their localization to specific cell types is needed to guide our understanding of microRNA activity. Current tissue-level data have caused considerable confusion, and comprehensive cell-level data do not yet exist. Here, we establish the landscape of human cell-specific microRNA expression. This project evaluated 8 billion small RNA-seq reads from 46 primary cell types, 42 cancer or immortalized cell lines, and 26 tissues. It identified both specific and ubiquitous patterns of expression that strongly correlate with adjacent superenhancer activity. Analysis of unaligned RNA reads uncovered 207 unknown minor strand (passenger) microRNAs of known microRNA loci and 495 novel putative microRNA loci. Although cancer cell lines generally recapitulated the expression patterns of matched primary cells, their isomiR sequence families exhibited increased disorder, suggesting DROSHA- and DICER1-dependent microRNA processing variability. Cell-specific patterns of microRNA expression were used to de-convolute variable cellular composition of colon and adipose tissue samples, highlighting one use of these cell-specific microRNA expression data. Characterization of cellular microRNA expression across a wide variety of cell types provides a new understanding of this critical regulatory RNA species.
Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Despite extensive research in qPCR laboratory protocols, normalization, and statistical analysis, little attention has been given to qPCR non-detects – those reactions failing to produce a minimum amount of signal. While most current software replaces these non-detects with a value representing the limit of detection, recent work suggests that this introduces substantial bias in estimation of both absolute and differential expression. Recently developed single imputation procedures, while better than previously used methods, underestimate residual variance, which can lead to anti-conservative inference. We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of relevant model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. In the proposed modeling framework, there are three sources of uncertainty: parameter estimation, the missing data mechanism, and measurement error. All three sources of variability are incorporated in the multiple imputation and direct estimation algorithms. We demonstrate the applicability of these methods on three real qPCR data sets and perform an extensive simulation study to assess model sensitivity to misspecification of the missing data mechanism, to the number of replicates within the sample, and to the overall size of the data set. The proposed methods result in unbiased estimates of the model parameters; therefore, these approaches may be beneficial when estimating both absolute and differential gene expression. The developed methods are implemented in the R/Bioconductor package nondetects. The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments, providing more confidence in generating scientific hypotheses and performing downstream analysis.
Cell morphological phenotypes, including shape, size, intensity, and texture of cellular compartments have been shown to change in response to perturbation with small molecule compounds. Image-based cell profiling or cell morphological profiling has been used to associate changes of cell morphological features with alterations in cellular function and to infer molecular mechanisms of action. Recently, the Library of Integrated Network-based Cellular Signatures (LINCS) Project has measured gene expression and performed image-based cell profiling on cell lines treated with 9515 unique compounds. These data provide an opportunity to study the interdependence between transcription and cell morphology. Previous methods to investigate cell phenotypes have focused on targeting candidate genes as components of known pathways, RNAi morphological profiling, and cataloging morphological defects; however, these methods do not provide an explicit model to link transcriptomic changes with corresponding alterations in morphology. To address this, we propose a cell morphology enrichment analysis to assess the association between transcriptomic alterations and changes in cell morphology. Additionally, for a new transcriptomic query, our approach can be used to predict associated changes in cellular morphology. We demonstrate the utility of our method by applying it to cell morphological changes in a human bone osteosarcoma cell line.
The sources of gene expression variability in human tissues are thought to be a complex interplay of technical, compositional, and disease-related factors. To better understand these contributions, we investigated expression variability in a relatively homogeneous tissue expression dataset from the Genotype-Tissue Expression (GTEx) resource. In addition to identifying technical sources, such as sequencing date and post-mortem interval, we also identified several biological sources of variation. An in-depth analysis of the 175 genes with the greatest variation among 133 lung tissue samples identified five distinct clusters of highly correlated genes. One large cluster included surfactant genes (SFTPA1, SFTPA2, and SFTPC), which are expressed exclusively in type II pneumocytes, cells that proliferate in ventilator associated lung injury. High surfactant expression was strongly associated with death on a ventilator and type II pneumocyte hyperplasia. A second large cluster included dynein (DNAH9 and DNAH12) and mucin (MUC5B and MUC16) genes, which are exclusive to the respiratory epithelium and goblet cells of bronchial structures. This indicates heterogeneous bronchiole sampling due to the harvesting location in the lung. A small cluster included acute-phase reactant genes (SAA1, SAA2, and SAA2–SAA4). The final two small clusters were technical and gender related. To summarize, in a collection of normal lung samples, we found that tissue heterogeneity caused by harvesting location (medial or lateral lung) and late therapeutic intervention (mechanical ventilation) were major contributors to expression variation. These unexpected sources of variation were the result of altered cell ratios in the tissue samples, an underappreciated source of expression variation.