Semi-supervised ordered weighted average fuzzy-rough nearest neighbour classifier for cancer pattern classification from gene expression data

Paper Details

Research Paper 01/05/2022
Views (558) Download (56)
current_issue_feature_image
publication_file

Semi-supervised ordered weighted average fuzzy-rough nearest neighbour classifier for cancer pattern classification from gene expression data

Abstract

Classification of cancer patterns from gene expression data is a difficult task in computational biology and artificial intelligence due to the sufficient number of training samples is often difficult, expensive, and hard to gather. Although, the classification results obtained by the conventional classifiers trained with insufficient training samples are generally low. However, unlabeled samples are relatively low-cost and easy to gather, whereas conventional classifiers do not utilize these unlabeled samples to train the model. In this context, a self-training-based model semi-supervised ordered weighted average fuzzy-rough nearest neighbour classifier for cancer pattern classification from gene expression data is proposed. The experiments are carried out on eight publicly available real-life gene expression cancer datasets. The performance of the proposed method is compared with four other methods (two supervised and two semi-supervised) in terms of percentage accuracy, precision, recall, macro averaged F1 measure, micro averaged F1 measure and kappa. The dominance of the proposed method is justified by the experimental results.

VIEWS 63

Cohen J. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46. https://doi.org/10.1177/001316446002000104

Cornelis C, Verbiest N, Jensen R. 2010. Ordered Weighted Average Based Fuzzy Rough Sets. In: Yu et al. Ed. Lecture Notes in Computer Science, Springer, Berlin, Germany 6401, 78–85. https://doi.org /10.1007/978-3-642-16248-0_16

Dettling M. 2004. Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18), 583–593. https://doi.org/10.1093/bioinformatics/bth447

Dettling M, Buhlmann P. 2003. Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061–1069. https://doi.org/10.1093/bioinformatics/btf867

Du D, Li K, Li X, Fei M. 2014. A novel forward gene selection algorithm for microarray data. Neurocomputing 133, 446–458. https://dblp.org/rec/journals/ijon/DuLLF14

Halder A, Ghosh S, Ghosh A. 2013. Aggregation pheromone metaphor for semi-supervised classification, Pattern Recognition  46(8), 2239–2248. https://doi.org/10.1016/j.patcog.2013.01.002

Halder A, Misra S. 2014. Semi-supervised fuzzy k-NN for cancer classification from microarray gene expression data.In: Proceedings of the 1st International Conference on Automation, Control, Energy and Systems (IEEE Computer Society Press), 1–5. https://doi.org/10.1109/ACES.2014.6808013

Jensen R, Cornelis C. 2008. A new approach to fuzzy-rough nearest neighbour classification. In:Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing, 310–319. https://doi.org/10.1007/978-3-540-88425-5_32

Jiang D, Tang C, Zhang A. 2004. Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16 (11), 1370–1386. https://doi.org/10.1109/TKDE.2004.68

Keller JM, Gray MR, Givens JA. 1985. A fuzzy K-nearest neighbor algorithm, IEEE  Transactions on Systems, Man and Cybernetics 15(4),  580–585. https://doi.org/10.1109/TSMC.1985.6313426

Pawlak Z. 1982.  Rough sets. International Journal of Computer and Information Science 11(5), 341–356. https://doi.org/10.1007/BF01001956

Priscilla R, Swamynathan S. 2013. A  semi-supervised  hierarchical  approach:  two-dimensional  clustering  of  microarray  gene  expression  data. Frontiers  of  Computer Science 7(2), 204–213.  https://doi.org/10.1007/s11704-013-1076-z

Stekel D. 2003.Microarray Bioinformatics. 1st ed.,  Cambridge,  Cambridge University Press, UK. https://doi.org/10.1093/aob/mch083

Zadeh L. 1965. Fuzzy sets. Information and Control 8(3),  338–353. http://dx.doi.org/10.1016/S0019-9958(65)90241-X