Statistical Approaches to Decreasing the Discrepancy of Non-detects in qPCR Data


Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Despite extensive research in qPCR laboratory protocols, normalization, and statistical analysis, little attention has been given to qPCR non-detects – those reactions failing to produce a minimum amount of signal. While most current software replaces these non-detects with a value representing the limit of detection, recent work suggests that this introduces substantial bias in estimation of both absolute and differential expression. Recently developed single imputation procedures, while better than previously used methods, underestimate residual variance, which can lead to anti-conservative inference. We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of relevant model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. In the proposed modeling framework, there are three sources of uncertainty: parameter estimation, the missing data mechanism, and measurement error. All three sources of variability are incorporated in the multiple imputation and direct estimation algorithms. We demonstrate the applicability of these methods on three real qPCR data sets and perform an extensive simulation study to assess model sensitivity to misspecification of the missing data mechanism, to the number of replicates within the sample, and to the overall size of the data set. The proposed methods result in unbiased estimates of the model parameters; therefore, these approaches may be beneficial when estimating both absolute and differential gene expression. The developed methods are implemented in the R/Bioconductor package nondetects. The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments, providing more confidence in generating scientific hypotheses and performing downstream analysis.

Valeriia Sherina
Principal Statistician at GSK
Winslow Powers
Bioinformatics Software Engineer at One Brave Idea
Matthew N. McCall
Associate Professor of Biostatistics and Biomedical Genetics