We propose to develop statistical methods for microRNA sequencing data with support of a $1.97 million R01 research grant from the National Institute of General Medical Sciences.
Alterations in microRNAs have been shown to disrupt entire cellular pathways, substantially contributing to a variety of human diseases such as heart disease and cancer. However, despite their importance, our understanding of the role of microRNAs is hampered by a lack of statistical methods designed specifically to analyze microRNA-sequencing data.
The grant aims to improve the analysis of microRNA-sequencing data by developing statistical methods that directly address the challenges unique to measuring expression levels of microRNAs. Statistical analysis of processed microRNA-seq data is currently performed using methods developed for mRNA-seq data despite the fact that the assumptions of these methods are violated. Specifically, methods for mRNA-seq data assume approximate independence between feature counts; however, the small total number of microRNAs and presence of a small number of very highly expressed microRNAs result in a lack of independence between microRNA counts.
Additionally, normalization methods for mRNA-seq data assume either the overall level of transcription is constant across samples or an equal number of features are over- and under-expressed when comparing any two samples, neither of which hold for microRNA-seq data.
The development of statistical methods that address the challenges of microRNA-seq data represents a critical need for miRNA research. These methods are necessary to fully elucidate the role miRNAs play in many human disease processes.