Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels, as well. MicroRNAs often target more than one mRNA (for human the average is three) and mRNAs are often targeted by more than one miRNA (for the genes considered in this study, the average is also three). Given a set of dysregulated transcripts, it is difficult to determine the minimal set of causative miRNAs.
We present a novel approach, maTE, based on machine learning which integrates miRNA target genes with gene expression data. maTE depends on the availability of a sufficient amount of patient and control samples. The samples are used to train classifiers to accurately differentiate among the samples on a per miRNA basis. A combined classifier is built from multiple miRNAs to improve separation.
The aim of the study is to find a set of miRNAs causing regulation of their target genes that best explains the difference between groups (e.g.: cancer vs. control). maTE provides a list of significant groups of genes where each group is targeted by a specific microRNA. For the datasets used in this study, maTE generally achieves an accuracy well above 80%. It is of note, that when the accuracy is much lower (e.g.: ~50%) the set of miRNAs provided is likely not causative for the difference in expression.
This new approach of integrating miRNA regulation with expression data yields powerful results and is independent of external labels and training data. Thereby, it opens up new avenues for exploring miRNA regulation and may pave the way for the development of miRNA-based biomarkers and drugs.
Availability and Implementation
Knime workflow is available at Bioinformatics online.