Welcome to International Network for Natural Sciences | INNSpub

Combining statistical and Fuzzy-rough classifiers for cancer Subtype prediction

Research Paper | August 10, 2022

| Download 50

Sukanta Majumder, Ansuman Kumar, Anindya Halder

Key Words:

J. Bio. Env. Sci.21( 2), 92-99, August 2022


JBES 2022 [Generate Certificate]


Cancer prediction from gene expression data is one of the challenging areas of research in the field of bioinformatics and machine learning. In gene expression data, labeled samples are very limited compared to unlabeled samples; and labeling of unlabeled data is expensive. Therefore, single classifier trained with limited training samples often fails to produce desired result. In this situation, combination of classifiers can be effective as its ensembles the results of individual classifiers which can improve the cancer prediction accuracy. In this article a novel method, combining statistical and fuzzy-rough classifiers (CSFRC) for cancer prediction is proposed which uses support vector machine, naive bayes as statistical classifiers and fuzzy-rough nearest neighbor classifier. The proposed method is able to deal the uncertainty, overlapping and indiscernibility usually present in cancer subtype classes of the gene expression data. The proposed method is validated on eight publicly available gene expression datasets. Experimental results suggest that the performance of the proposed method provides better results in comparison to other compared classifiers for cancer subtype prediction from gene expression data. The proposed method turns out to be very effective in cancer prediction from gene expression data particularly when the individual classifier result is not up to the mark with limited training samples.


Copyright © 2022
By Authors and International Network for
Natural Sciences (INNSPUB)
This article is published under the terms of the Creative
Commons Attribution Liscense 4.0

Combining statistical and Fuzzy-rough classifiers for cancer Subtype prediction

Chandra B, Gupta M. 2011. Robust approach for estimating probabilities in naïve Bayesian classifier for gene expression data. Expert Systems with Applications 38(3), 1293-1298.

Cohen J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37-46.

Dettling M, Buhlmann P. 2003. Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061-1069.

Dettling M. 2004. Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18), 583-593.

Du D, Li K, Li X, Fei M. 2014. A novel forward gene selection algorithm for microarray data. Neurocomputing 133, 446-458.

Halder A, Misra S. 2014. Semi-supervised fuzzy k-NN for cancer classification from microarray gene expression data. In Proceedings of the 1st International Conference on Automation, Control, Energy and Systems (ACES 2014) (IEEE Computer Society Press) 1-5.

Jensen R, Cornelis C. 2008. A new approach to fuzzy-rough nearest neighbour classification. In: Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing 310-319, 2008.

Jiang D, Tang C, Zhang A. 2004. Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370-1386.

Keller JM, Gray MR, Givens JA. 1985. A fuzzy K -nearest neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics 15(4), 580-585.

Kumar A, Halder A. 2019. Active learning using fuzzy-rough nearest neighbour classifier for cancer prediction from microarray gene expression data. International Journal of Pattern Recognition and Artificial Intelligence 34(1), p. 2057001.

Kumar A, Halder A. 2020. Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Engineering Applications of Artificial Intelligence 91, p. 103591.

Kuncheva LI. 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, 2nd ed.

Marak DCB, Halder A, Kumar A. 2021. Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA gene expression data. New Generation Computing (Springer) 1-27.

Osareh A, Shadgar B. 2013. An efficient ensemble learning method for gene microarray classification. BioMed Research International 2013(1), 1-10.

Pawlak Z. 1982. Rough sets. International Journal of Computer and Information Science 11(5), 341-356.

Polikar R. 2006. Ensemble based systems in decision making”, IEEE Circuits and Systems Magazine 6(3), 21-45.

Priscilla R, Swamynathan S. 2013. A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data. Frontiers of Computer Science 7(2), 204-213.

Radzikowska AM, Kerre EE. 2002. A comparative study of fuzzy rough sets. Fuzzy Sets and Systems 126, 137-156.

Stekel D. 2003. Microarray Bioinformatics. 1st ed., Cambridge University Press, Cambridge, UK.

Valentini G, Muselli M, Ruffino F. 2004. Cancer recognition with bagged ensembles of support vector machines. Neurocomputing 56, 461-466.

Vanitha CDA, Devaraj D, Venkatesulu M. 2015. Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Computer Science 47, 13-21.

Yang P, Yang YH, Zhou BB, Zomaya AY. 2010. A review of ensemble methods in bioinformatics. Machine Learning 5(4), 296-308.

Zadeh L. 1965. Fuzzy sets. Information and Control 8(3), 338-353.