miRcorrNet: Integrated microRNA Gene Expression and mRNA Expression Based Machine Learning combined with Features Grouping and Ranking
Malik Yousef1,2, Gokhan Goy3, Christine M. Eischen, Ramkrishna Mitra and Burcu Bakir-Gungor3
1 Department of Information Systems, Zefat Academic College, Zefat, 13206, Israel.
2 Galilee Digital Health Research Center (GDH), Zefat Academic College, Israel
3 Department of Computer Engineering, Abdullah Gül University, Kayseri, 38090, Turkey
A better understanding of disease development and progression mechanisms at the molecular level has become very critical both for the diagnosis of the disease and for the development of therapeutic approaches. In the field of gene expression due to recent technology one is able to obtain mRNA gene expressions and miRNA gene expressions. Most studies consider only mRNA or microRNAs expression data to investigate these mechanisms. However, understanding the complex structures of complex diseases using one type of omics data poses challenges. On the other hand, the advancements in high-throughput technologies resulted in the production of mRNA and microRNA data in huge amounts and at affordable costs. Hence, it became possible to integratively analyze the expression profiles of both mRNA and microRNA in the same individual. Such integrated analyses aim to enlighten the functional effects of RNA expression in complex diseases, e.g. cancer.
Most of the approaches that integrate miRNA and mRNA are based on statistical methods, such as Pearson correlation, combined with enrichment analysis approaches. We are aware of two tools that serve the researcher for integration analysis. The other studies just use different packages to perform this task.
In this study, we developed a novel tool called miRcorrNet, which performs machine learning-based integration to analyze miRNA and mRNA gene expression profiles. miRcorrNet groups mRNA genes based on their correlation to the miRNA expressions. Then these groups are subject to a rank function, which estimates whether these are significant for the classification of the given two classes’ data.
We have tested our tool on the miRNA-seq and mRNA-seq expression profiles that we have downloaded from the TCGA data portal for 11 solid tumor types. In our experiments, we reported the average performance measures of 100-fold Monte Carlo Cross-Validation (MCCV). Additionally, we have considered another tool named maTE that has similar merit and SVM-rfe for comparison purposes and for biological deep analysis. The performance results show that the tool is working as good as other tools in terms of accuracy measurements reaching AUC above 95% on average. Moreover, we conducted a deep biological analysis to explore the list of significant genes and significant groups represented by the miRNA. The biological analysis shows that it is very meaningful.
We believe that our tool will serve the genetics community in order to more precisely identify the target genes for each microRNA using microRNA and gene expression profiles simultaneously.