HomeTechnologyFundamentals of Classification Models Part-1

Fundamentals of Classification Models Part-1

Reading Time: 3 minutes

Introduction

Supervised Machine Learning algorithm can be broadly categorized into Regression and Classification Algorithms. In Regression algorithms , we have to predict the output for continuous values, but to predict the categorical values , we need Classification algorithms.

Classification can be performed on structured or unstructured data. The main goal of a classification dispute is to identify the category or a class to which a new data will fall under.

We use the training Data set to get better boundary conditions that could be used to determine each goal class. Once the boundary conditions are determined, the next task is to predict the goal class. The whole process is known as classification.

In this Blog we will Discuss some of the wellliked classification models which include :

1)Support Vector Classifiers

2)Decision Trees

3)Random Forest Classifiers.

There are various evaluation methods to find out the accuracy of these models also. We will discuss these models, the evaluation methods and a technique to improve these models called Hyper parameter tuning in detail.

Let us first dive into Classification types: Binary Classification , Multi Class Classification , Multi Label Classification.

Binary Classification

Binary classification has only 2 categories. Usually, they are boolean values – 1 or 0 , True or False, High or Low. Some examples where such a classification could be used is in cancer detection or email spam detection where the labels would be positive or unfavorable for cancer and spam or not spam for spam detection.

Models that can be used for Binary classification are:

> Logistic Regression .

> Support vector Classifiers.

You can also use Decision Trees, Random Forests and other algorithms but Logistic Regression and Support Vector Classification are used exclusively for binary classification.

We are using a breast cancer detection data-set that can be downloaded from here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data= pd.read_csv("data.csv")
data.head()

Now we will try to Scatter Plot

sns.scatterplot(x="radius_mean",y="texture_mean",hue="analysis",data=data)
1638627903 640 Fundamentals of Classification Models Part 1
Scatter plot for binary classification

Here you can see the 2 ‘classes’ – ‘M’ stands for malignant and ‘B’ stands for benign. As you can see, the classes are well divided and are easily differentiable to the naked eye for these 2 features. However, this will not be true for all pairs of features.

Models that can be used for such a classification are:

  • Logistic Regression
  • Support Vector Classifiers

You can also use Decision Trees, Random Forests and other algorithms but Logistic Regression and Support Vector Classification are used exclusively for binary classification.

Regression and Support Vector Classification are used exclusively for binary classification.

Multi Class Classification:

Multi-class classifiers or multi-nominal classifiers can distinguish between more than 2 classes.

Example : Classifications of types of crops, Classification of types of music. Algorithms such as Random Forests and Naive Bayes can easily construct a multi-class classifier model.

import seaborn as sns
penguins = sns.load_dataset("penguins")
penguins.head()
sns.scatterplot(x="bill_length_mm",y="flipper_length_mm",hue="species",data=penguins)
1638627904 626 Fundamentals of Classification Models Part 1

Algorithms such as Random Forests and Naive Bayes can easily construct a multi class classifier model. Other algorithms like Support Vector Classifiers and Logistic Regression are used only for Binary Classification.

Multi Label Classification

This type of classification occurs when a single observation contains multiple labels. For example, a single image might contain a car, a truck and a human. The algorithm should be able to classify each of them individually. Thus it has to be skilled for many labels and should report True for a car, truck and human and False for any other labels it has skilled for.

Classification Algorithms can be further divided into the Mainly 2 category:
–>Linear Model : Logistic Regression,Support Vector Machines

–>Non-linear Models : K-Nearest Neighbours, Kernel , Naive Bayes, Decision Tree Classification, Random Forest Classification.

For this section of the blog you can entry to the codes in this link https://github.com/mohana-sai/presentationcodes

Classifier Models:

We need to prepare the data for training the algorithm. The first step is to pre-process and clean the data.The cleaning we need for this data-set is to change the string names of the flowers to integer values so the algorithm can classify them properly. We also need to drop the observations having “NaN” values.

In the next part we will discuss more about classifier models and the how the implementation is done . That is all for now .



Source

Most Popular