Intro to Support Vector Machines

Support Vector Machine are a powerful class of supervised and unsupervised learning algorithms used in classification models.

Developed by Vladimir Vapnik at Bell Laboratories in the mid-90s from a theory proposed in his early 70s PhD work, SVM uses “edge cases” in order to split labeled data by a boundary line in the exact middle of these cases. SVM aims to maximize this margin by mapping training points to a higher dimensional space where a linearly separating hyperplane can be found. If a sample is correctly classified but does not lie beyond the margin, the model is penalized to find better margins and a new boundary. Though Vapnik had primarily worked with linear separations, his partner at Bell proposed using polynomial and alternative kernels to separate non-linearly separable data.

SVM with second degree polynomial kernel

SciKitLearn’s Support Vector Classification models (SVC(), NuSVC(), LinearSVC()) utilize Vapnik’s work.

Kernel Functions

Kernels provide the means for how the data is projected into the new dimensional space. Scikitlearn’s SVC models provides five types of kernels: linear, polynomial, sigmoid, radial basis function, and precomputed, in which the user provides their own kernel.

Radial basis functions are the most common kernel and the default for SciKitLearn’s SVC models. In the radial basis function kernel, squared euclidean distance is multiplied by gamma, a free parameter. The radial basis kernel is also the reason why SVCs are classified as a neural network because of the nonlinear learning rule.

Implementing a Support Vector Classification model

If you are working with sparse data or data that has several features like NLP classification problems, you might consider a SVC model as an alternative to a vanilla neural network, logistic regression, decision tree, or unsupervised clustering model. Although many say its complexity results in the model taking a longer time to train, because its decision function uses only a small number of training points, its predictions can be memory efficient.

Creating your own SVC model will mostly involving dialing in the hyper parameters of penalty (C), type of kernel, and a value for gamma if a radial basis function, sigmoid, or polynomial kernel is being used. If you are using a polynomial kernel, degree should also be considered. Gamma can either be user-provided or created through auto or scale, which is the default parameter. Scale uses the formula 1 / (n_features * X.var()) while auto uses 1/n_features.

from sklearn.svm import SVC

NuSVC() is similar to SVC(), except NuSVC() provides the ability to specify the number of vectors via the ‘nu’ parameter, which is a value from 0 to 1 representing the quotient of penalty over the number of vectors.

NuSVC with ‘auto’ gamma and default paramaters

Further resources:

SciKitLearn Documentation —

MIT Lecture on the linear algebra behind SVM —

Blog on SVM at Zenva —

Data Scientist and Writer, passionate about language