Journal of Engineering and Applied Sciences

Year: 2018
Volume: 13
Issue: 21
Page No. 9065 - 9077

Improved Performance of Support Vector Machine for Imbalanced Data Sets Using Oversampling and Optimization

Authors : Sana Saeed and Hong Choon Ong

References

Abe, S., 2005. Support Vector Machines for Pattern Classification. Springer, Berlin, Germany, ISBN-13:978-1-85233-929-6, Pages: 345.

Alcala-Fdez, J., A. Fernandez, J. Luengo, J. Derrac and S. Garcia et al., 2011. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple Valued Logic Soft Comput., 17: 255-287.
Direct Link  |  

Alwan, H.B. and K.R. Ku-Mahamud, 2013. Solving support vector machine model selection problem using continuous ant colony optimization. Intl. J. Inf. Process. Manage., 4: 86-97.
CrossRef  |  Direct Link  |  

Alwan, H.B. and K.R. Ku-Mahamud, 2017. Mixed-variable ant colony optimisation algorithm for feature subset selection and tuning support vector machine parameter. Intl. J. Bio Inspired Comput., 9: 53-63.
CrossRef  |  Direct Link  |  

Askan, A. and S. Sayın, 2014. SVM classification for imbalanced data sets using a multiobjective optimization framework. Ann. Oper. Res., 216: 191-203.
CrossRef  |  Direct Link  |  

Bao, Y., Z. Hu and T. Xiong, 2013. A PSO and pattern search based memetic algorithm for SVMs parameters optimization. Neurocomput., 117: 98-106.
CrossRef  |  Direct Link  |  

Batuwita, R. and V. Palade, 2010. Efficient resampling methods for training support vector machines with imbalanced datasets. Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN’10), July 18-23, 2010, IEEE, Barcelona, Spain, ISBN:978-1-4244-6916-1, pp: 1-8.

Bekkar, M., H.K. Djemaa and T.A. Alitouche, 2013. Evaluation measures for models assessment over imbalanced datasets. J. Inf. Eng. Appl., 3: 27-39.
Direct Link  |  

Ben-Hur, A. and J. Weston, 2010. A User’s Guide to Support Vector Machines. In: Data Mining Techniques for the Life Sciences, Carugo, O. and F. Eisenhaber (Eds.). Humana Press, New York, USA., ISBN:978-1-60327-240-7, pp: 223-239.

Bhadra, T., S. Bandyopadhyay and U. Maulik, 2012. Differential evolution based optimization of SVM parameters for meta classifier design. Procedia Technol., 4: 50-57.

Blondin, J. and A. Saad, 2010. Metaheuristic techniques for support vector machine model selection. Proceedings of the 10th International Conference on Hybrid Intelligent Systems (HIS’10), August 23-25, 2010, IEEE, Atlanta, Georgia, ISBN:978-1-4244-7363-2, pp: 197-200.

Cao, P., D. Zhao and O. Zaiane, 2013. An optimized cost-sensitive SVM for imbalanced data learning. Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, April 14-17, 2013, Springer, Gold Coast, Australia, ISBN:978-3-642-37455-5, pp: 280-292.

Cervantes, J., F. Garcia-Lamont, L. Rodriguez, A. Lopez and J.R. Castilla et al., 2017. PSO-based method for SVM classification on skewed data sets. Neurocomput., 228: 187-197.
CrossRef  |  Direct Link  |  

Chaudhuri, A. and K. De, 2011. Fuzzy support vector machine for bankruptcy prediction. Appl. Soft Comput., 11: 2472-2486.
CrossRef  |  Direct Link  |  

Chawla, N.V., K.W. Bowyer, L.O. Hall and W.P. Kegelmeyer, 2002. SMOTE: Synthetic minority Over-sampling technique. J. Artificial Intell. Res., 16: 321-357.
CrossRef  |  

Chawla, N.V., N. Japkowicz and A. Kolcz, 2004. Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations, 6: 1-6.
Direct Link  |  

Demsar, J., 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7: 1-30.
Direct Link  |  

Eitrich, T. and B. Lang, 2006. Efficient optimization of support vector machine learning parameters for unbalanced datasets. J. Comput. Appl. Math., 196: 425-436.
CrossRef  |  Direct Link  |  

Fan, Q., Z. Wang, D. Li, D. Gao and H. Zha, 2017. Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl. Based Syst., 115: 87-99.
CrossRef  |  Direct Link  |  

Galar, M., A. Fernandez, E. Barrenechea and F. Herrera, 2013. EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit., 46: 3460-3471.
CrossRef  |  Direct Link  |  

Ganganwar, V., 2012. An overview of classification algorithms for imbalanced datasets. Int. J. Emerging Technol. Adv. Eng., 2: 42-47.
Direct Link  |  

Garcia, S., A.D. Benitez, F. Herrera and A. Fernandez, 2007. Statistical comparisons by means of non-parametric tests: A case study on genetic based machine learning. Algorithms, 13: 95-104.
Direct Link  |  

Graczyk, M., T. Lasota, Z. Telec and B. Trawinski, 2010. Nonparametric statistical analysis of machine learning algorithms for regression problems. Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, September 8-10, 2010, Springer, Cardiff, Wales, UK., ISBN:978-3-642-15386-0, pp: 111-120.

Haixiang, G., L. Yijing, J. Shang, G. Mingyun and H. Yuanyue et al., 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl., 73: 220-239.
CrossRef  |  Direct Link  |  

Han, H., W.Y. Wang and B.H. Mao, 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. Proceedings of the International Conference on Intelligent Computing, August 23-26, 2005, Hefei, China, pp: 878-887.

Hansen, N. and A. Ostermeier, 1997. Convergence properties of evolution strategies with de-randomized covariance matrix adaptation: The (µ/µ1, λ)-CMA-ES. Proceedings of the EUFIT’97: 5th Europe Congress on Intelligent Techniques and Soft Computing, September 8-11, 1997, Aachen, Germany, pp: 650-654.

He, H. and E.A. Garcia, 2009. Learning from imbalanced data. IEEE Trans. Knowledge Data Eng., 21: 1263-1284.
CrossRef  |  

Hsu, C.W. and C.J. Lin, 2002. A simple decomposition method for support vector machines. Mach. Learn., 46: 291-314.
CrossRef  |  Direct Link  |  

Hu, G.Y. and P.L. Qiao, 2015. An efficient improvement of CMA-ES algorithm for the network security situation prediction. Open Autom. Control Syst. J., 7: 1499-1517.
CrossRef  |  Direct Link  |  

Huang, C.L. and C.J. Wang, 2006. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst. Applic., 31: 231-240.
CrossRef  |  Direct Link  |  

Huang, C.L., M.C. Chen and C.J. Wang, 2007. Credit scoring with a data mining approach based on support vector machines. Exp. Syst. Applic., 33: 847-856.
CrossRef  |  

Igel, C., N. Hansen and S. Roth, 2007. Covariance matrix adaptation for multi-objective optimization. Evolut. Comput., 15: 1-28.
PubMed  |  Direct Link  |  

Imam, T., K.M. Ting and J. Kamruzzaman, 2006. Z-SVM: An SVM for improved classification of imbalanced data. Proceedings of the 19th Australasian Joint Conference on Artificial Intelligence, December 4-8, 2006, Springer, Hobart, Australia, ISBN:978-3-540-49787-5, pp: 264-273.

Jiang, P., S. Missoum and Z. Chen, 2014. Optimal SVM parameter selection for non-separable and unbalanced datasets. Struct. Multidiscip. Optim., 50: 523-535.
Direct Link  |  

Kotsiantis, S., D. Kanellopoulos and P. Pintelas, 2006. Handling imbalanced datasets: A review. GESTS. Intl. Trans. Comput. Sci. Eng., 30: 25-36.
Direct Link  |  

Krawczyk, B., 2016. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell., 5: 221-232.
Direct Link  |  

Lee, J., Y. Wu and H. Kim, 2015. Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns. J. Appl. Stat., 42: 676-689.
Direct Link  |  

Lessmann, S., 2004. Solving imbalanced classification problems with support vector machines. Proceedings of the International Conference on Artificial Intelligence (ICAI'04), June 21-24, 2004, CSREA Press Publisher, Las Vegas, Nevada, pp: 214-220.

Li, J., S. Fong, R.K. Wong and V.W. Chu, 2018. Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion, 39: 1-24.

Liu, X. and H. Fu, 2014. PSO-based support vector machine with Cuckoo search technique for clinical disease diagnoses. Sci. World J., 2014: 1-7.
CrossRef  |  PubMed  |  Direct Link  |  

Longadge, R. and S. Dongre, 2013. Class imbalance problem in data mining review. Intl. J. Comput. Sci. Netw., 2: 1-6.
Direct Link  |  

Lopez, V., A. Fernandez, M.J. del Jesus and F. Herrera, 2013. A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl. Based Syst., 38: 85-104.
CrossRef  |  

Lusa, L., 2013. SMOTE for high-dimensional class-imbalanced data. BMC. Bioinf., 14: 106-121.
PubMed  |  Direct Link  |  

Napierala, K. and J. Stefanowski, 2016. Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst., 46: 563-597.
Direct Link  |  

Pant, R., T.B. Trafalis and K. Barker, 2011. Support vector machine classification of uncertain and imbalanced data using robust optimization. Proceedings of the 15th WSEAS International Conference on Computers, July 15-17, 2011, WSEAS, Stevens Point, Wisconsin, USA., ISBN:978-1-61804-019-0, pp: 369-374.

Phung, S.L., A. Bouzerdoum and G.H. Nguyen, 2009. Learning Pattern Classification Tasks with Imbalanced Data Sets. In: Pattern Recognition, Yin, P. (Ed.). IntechOpen, Vukovar, Croatia, pp: 193-208.

Ren, Y. and G. Bai, 2010. Determination of optimal SVM parameters by using GA/PSO. J. Comput., 5: 1160-1168.
CrossRef  |  Direct Link  |  

Rosales-Perez, A., J.A. Gonzalez, C.A.C. Coello, H.J. Escalante and C.A. Reyes-Garcia, 2015. Surrogate-assisted multi-objective model selection for support vector machines. Neurocomput., 150: 163-172.
CrossRef  |  

Saeed, S. and H.C. Ong, 2018. A bi-objective hybrid algorithm for the classification of imbalanced noisy and borderline data sets. Patt. Anal. Appl., 1: 1-20.
CrossRef  |  Direct Link  |  

Sain, H. and S.W. Purnami, 2015. Combine sampling support vector machine for imbalanced data classification. Procedia Comput. Sci., 72: 59-66.

Shin, K.S., T.S. Lee and H.J. Kim, 2005. An application of support vector machines in bankruptcy prediction model. Exp. Syst. Applic., 28: 127-135.
CrossRef  |  

Sun, A., E.P. Lim and Y. Liu, 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decis. Support Syst., 48: 191-201.
CrossRef  |  Direct Link  |  

Sun, Z., Q. Song, X. Zhu, H. Sun and B. Xu et al., 2015. A novel ensemble method for classifying imbalanced data. Pattern Recognit., 48: 1623-1637.
CrossRef  |  Direct Link  |  

Wang, L., G. Xu, J. Wang, S. Yang and L. Guo et al., 2011. GA-SVM based feature selection and parameters optimization for BCI research. Proceedings of the 7th International Conference on Natural Computation (ICNC’11) Vol. 1, July 26-28, 2011, IEEE, Shanghai, China, ISBN:978-1-4244-9950-2, pp: 580-583.

Wang, Q., 2014. A hybrid sampling SVM approach to imbalanced data classification. Abstr. Appl. Anal., 2014: 1-7.
Direct Link  |  

Wang, Q., Z. Luo, J. Huang, Y. Feng and Z. Liu, 2017. A novel ensemble method for imbalanced data learning: Bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci., 2017: 1-11.
PubMed  |  Direct Link  |  

Weiss, G.M., 2004. Mining with rarity: A unifying framework. ACM SIGKDD Explorations Newsl., 6: 7-19.
CrossRef  |  Direct Link  |  

Wu, G. and E.Y. Chang, 2003. Class-boundary alignment for imbalanced dataset learning. Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets II, August 21, 2003, ICML, Washington, DC., USA., pp: 49-56.

Wu, S.J., V.H. Pham and T.N. Nguyen, 2017. Two-phase optimization for support vectors and parameter selection of support vector machines: Two-class classification. Appl. Soft Comput., 59: 129-142.
CrossRef  |  Direct Link  |  

Yang, X.S. and S. Deb, 2009. Cuckoo search via Levy flights. Proceedings of the World Congress on Nature and Biologically Inspired Computing (NaBIC), December 9-11, 2009, IEEE, Coimbatore, India, ISBN:978-1-4244-5053-4, pp: 210-214.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved