Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

Document Type : Original Article

Authors

1 Department of Medical Informatics, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran

2 Department of Electrical Engineering, Faculty of Engineering, University of Birjand, Birjand, Iran

3 Robotics Laboratory, Department of Electrical Engineering, University of Neyshabur, Neyshabur, Iran

4 Pharmaceutical Research Center, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran

5 Department of Medical Informatics, Academic Medical Center, Amsterdam, The Netherlands

Abstract

Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets.
Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN) and PS-classifier and genetic algorithm based classifier (GA-classifier) on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC), Wisconsin diagnosis breast cancer (WDBC), and Wisconsin prognosis breast cancer (WPBC).
Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection.
Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets.

Keywords


1. Sarbaz M, Pournik O, Ghalichi L, Kimiafar K, Razavi AR. Designing a Human T-Lymphotropic Virus Type 1 (HTLV-I) Diagnostic Model Using the Complete Blood Count. Iran J Basic Med Sci 2013; 16:247.
2. Tayarani A, Baratian A, Sistani MB, Saberi MR, Tehranizadeh Z. Artificial neural networks analysis used to evaluate the molecular interactions between selected drugs and human cyclooxygenase2 receptor. Iran J Basic Med Sci 2013; 16:1196.
3. Breastcancer.org: Knowing your risk can save your life [Internet]. Breastcancer.org. 2016 [cited 12 May 2016]. Available from: http://www.breastcancer.org
4. Basha SS, Prasad KS. Automatic detection of breast cancer mass in mammograms using morphological operators and fuzzy c--means clustering. J Theor Appl Inf Technol 2009; 5.
5. How is breast cancer diagnosed? [Internet]. Cancer.org. 2016 [cited 12 May 2016]. Available from: http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-diagnosis.
6. Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretations of mammograms. N Engl J Med 1994; 331:1493-1499.
7. Fletcher SW, Black W, Harris R, Rimer BK, Shapiro S. Report of the international workshop on screening for breast cancer. J Nat Cancer Inst 1993; 85:1644-1656.
8. Willems SM, Van Deurzen CH, Van Diest PJ. Diagnosis of breast lesions: fine-needle aspiration cytology or core needle biopsy? A review. J clin pathol 2012; 65:287-292.
9. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell 1997; 97:273-324.
10. Abe N, Kudo M, Toyama J, Shimbo M. A divergence criterion for classifier-independent feature selection.  Advances in Pattern Recognition: Springer; 2000. p. 668-676.
11. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res 2003; 3:1157-1182.
12. Bermejo P, Gámez JA, Puerta JM. A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets. Pattern Recognit Lett 2011; 32:701-711.
13. Aghdam MH, Ghasem-Aghaee N, Ehsan Basiri M, editors. Application of ant colony optimization for feature selection in text categorization. Evolutionary Computation, 2008 CEC 2008(IEEE World Congress on Computational Intelligence) IEEE Congress on; 2008: IEEE.
14. Unler A, Murat A. A discrete particle swarm optimization method for feature selection in binary classification problems. Eur J Oper Res 2010; 206:528-539.
15. Karegowda AG, Jayaram M, Manjunath A. Feature subset selection problem using wrapper approach in supervised learning. Int J Comput Appl 2010; 1:13-17.
16. Youn E, Koenig L, Jeong MK, Baek SH. Support vector-based feature selection using Fisher’s linear discriminant and Support Vector Machine. Exp Syst Appl 2010; 37:6148-6156.
17. Deisy C, Subbulakshmi B, Baskar S, Ramaraj N, editors. Efficient dimensionality reduction approaches for feature selection. Conference on Computational Intelligence and Multimedia Applications, 2007 International Conference on; 2007: IEEE.
18. Sridevi T, Murugan A. An intelligent classifier for breast cancer diagnosis based on K-Means clustering and rough set. Int J Comput Appl 2014; 85:38-42.
19. Sridevi T, Murugan A. A novel feature selection method for effective breast cancer diagnosis and prognosis. Int J Comput Appl 2014; 88:28-33.
20. UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set [Internet]. Archive.ics.uci.edu. 2016 [cited 12 May 2016]. Available from:
http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
21. Holland JH. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence: U Michigan Press; 1975.
22. Zahiri SH, Seyedin SA. Swarm intelligence based classifiers. J Franklin Inst 2007; 344:362-3676.
23. Kennedy J, Eberhart R, editors. Particle swarm optimization. Proceedings of the IEEE International Conference on Neural Networks; 1995.
24. Bandyopadhyay S, Murthy CA, Pal SK. Theoretical performance of genetic pattern classifier. J Franklin Inst 1999; 336:387-422.
25. Oh IS, Lee JS, Moon BR. Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 2004; 26:1424-1437.
26. Hadizadeh F, Vahdani S, Jafarpour M. Quantitative Structure-Activity Relationship Studies of 4-Imidazolyl-1, 4-dihydropyridines as Calcium Channel Blockers. Iran J Basic Med Sci 2013; 16:910-916.
27. Lavanya D, Rani DK. Analysis of feature selection with classification: Breast cancer datasets. Indian Journal of Computer Science and Engineering (IJCSE); 2011; 2:756-763.
28. Karabatak M, Ince MC. An expert system for detection of breast cancer based on association rules and neural network. Exp Syst Appl 2009; 36:3465-3469.
29. Chen HL, Yang B, Liu J, Liu DY. A support vector
machine classifier with rough set-based feature selection for breast cancer diagnosis. Exp Syst Appl 2011; 38:9014-9022.
30. Senturk ZK, Kara R. Breast Cancer Diagnosis via Data Mining: Performance Analysis of Seven different algorithms. Computer Science & Engineering 2014; 4:35.
31. Noruzi A, Sahebi H. A graph-based feature selection method for improving medical diagnosis. Adv Comput Sci 2015; 4:36-40.
32. Zhao JY, Zhang ZL, editors. Fuzzy rough neural network and its application to feature selection. Advanced Computational Intelligence (IWACI), 2011 Fourth International Workshop on; 2011: IEEE.
33. Liu Y, Zheng YF. FS_SFS: A novel feature selection method for support vector machines. Pattern Recognit 2006; 39:1333-1345.
34. Dumitru D. Prediction of recurrent events in breast cancer using the Naive Bayesian classification. Annals of the University of Craiova-Mathematics and Computer Science Series 2009; 36:92-96.
35. Jacob SG, Ramani RG, editors. Efficient classifier for classification of prognostic breast cancer data through data mining techniques. Proceedings of the World Congress on Engineering and Computer Science; 2012.
36. Richards G, Rayward-Smith VJ, Sonksen PH, Carey S, Weng C. Data mining for indicators of early mortality in a database of clinical records. Artif Intell Med 2001; 22:215-231.