Support Vector Machine
As you can see from the previous example of decision trees. For classification, we are basically after a decision boundary. If we can somehow manage the decision boundary then we can safely classify the data in different categories. Towards that Support Vector Machines (SVM) are good tools.
The core idea of SVM is to find a maximum marginal hyperplane (MMH) that best divides the dataset into two classes.

Figure 1:“svm”
Support Vectors: Support vectors are the datapoints closest to the hyperplane. These points are the most relevant to the construction of the classifier.
Hyperplane: This is the decision boundary which seperates the two classes.
Margin: This is the perpendicular distance from the hyperplane to the support vectors or closest points. The more the margin, the results are more better.
Working principle¶
Without going much into the mathematics, the flowchart of SVM can be summarised as
Assume a Linear Separator:: SVM tries to find a hyperplane , that separates the data into two classes.
Maximize the Margin: Rather than checking all possible planes, SVM optimizes for the one that maximizes the margin — the distance between the hyperplane and the closest points from each class, called support vectors.
Optimization Problem: Minimise
subject to , with being the label of the class. This is a quadratic programming problem.
Support Vectors Determine the Plane: Only data points on the margin boundaries (i.e., support vectors) affect the position of the hyperplane.
Use Kernel Trick (if needed): For non-linear data, use a kernel function (e.g., Radial Basis Function (RBF), polynomial) to project data into higher dimensions where it becomes linearly separable.
To illustrate the Kernel trick, consider that we have a two dimensional dataset about all the students, where we have the data describing the amount of coffee they consumed and whether or not they love cats. Let us represent by blue dots people who love cat and red dot to denote poeple who dont.

NOTE: If time permits we will demonstrate SVM on the same dataset we used for the classification and later add the notebook here.