Regularized SVM Classification with a new Complexity-Driven Stochastic Optimizer
Keywords:
Supervised classification, Discriminant analysis, Support vectors, Information criteria, Feature selection, Stochastic optimization, Reproducing kernel Hilbert spaceAbstract
Given a multivariate dataset composed of data from different known sources or processes, how can we create a rule to separate the data, and classify any future data? Kernel discriminant analysis is one of many supervised learning techniques that handle this problem. Recently, in this and other knowledge discovery problems, kernel methods have gained popularity. This is somewhat ironic as another common theme is variable reduction, and kernel methods actually inflate dimensionality. Due to the substantial benefits of processing "kernelized" data, this is excusable - kernel methods frequently outperform traditional classification techniques for real data when the classes are not easily separable. In performing kernel discriminant analysis, there are two main issues that we address in this article. The first is that, in the literature, the question of which kernel function to use is often subjectively selected a prior, or determined by cross-validation with the sole objective of maximizing classification performance. Secondly, after obtaining discriminant functions or support vectors to classify a dataset, how do we know which of our variables are most responsible for, and important to, the classification? In this research, we develop a new regularized algorithmthat simultaneously selects the kernel function and subset of original variables. Our algorithm, a hybrid of cross-validation and the genetic algorithm, does this by optimizing a function that rewards correct classification while penalizing model complexity and misclassification.
We report results on three real datasets, including data from a medical imaging study. For the latter, we obtained an impressively low misclassification rate of 0.3%, while reducing the number of features from p = 20 to p∗ = 6.
Downloads
Published
Issue
Section
License
Upon acceptance of an article by the European Journal of Pure and Applied Mathematics, the author(s) retain the copyright to the article. However, by submitting your work, you agree that the article will be published under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license allows others to copy, distribute, and adapt your work, provided proper attribution is given to the original author(s) and source. However, the work cannot be used for commercial purposes.
By agreeing to this statement, you acknowledge that:
- You retain full copyright over your work.
- The European Journal of Pure and Applied Mathematics will publish your work under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
- This license allows others to use and share your work for non-commercial purposes, provided they give appropriate credit to the original author(s) and source.