Classification 1 - Introduction

What is Classification?

Input : Vector x

Output : Class 

Note : There can be k distinct classes

Assumption : Classes are disjoint. That is, no 2 samples can have the same class.

Goal : To find the decision boundary between classes

Decision Region

The input space with the divisions for various classes

Decision Boundary or Decision Surface

These are the boundaries that divide the input space.


In the above example, the input space is 2-Dimensional and the decision boundary is the line(purple). The decision boundary is always one dimension less than the input space dimension. That is, if the input space is of D-dimensions, the decision boundary will be of dimension D-1.

What are Classification Models?

Imagine sorting emails into "spam" and "inbox" or categorizing images as "cat" or "dog." These are classification tasks, where we want to predict a discrete label (category) for a given data point. Classification models are algorithms trained to perform this kind of prediction.

Types of Classification Problems:

  • Binary Classification: Two possible categories (e.g., spam/not spam, cat/dog).
  • Multi-class Classification: Three or more categories (e.g., classifying handwritten digits 0-9, identifying different types of flowers).

How do Classification Models Work?

  1. Training: The model is trained on a dataset of labeled examples. Each example consists of features (data points) and a corresponding label (category). The model learns to identify patterns that distinguish between the different categories.

  2. Prediction: Once trained, the model can predict the category label for a new, unseen data point. It analyzes the features of the new data point and compares them to the patterns learned during training.

Linearly Separable

Given a dataset, if we can separate the classes with a linear decision boundary(as in figure), the dataset is said to be linearly separable.


Representing Class Labels for Probabilistic Models

1. Binary Classification

Two possible categories (e.g., spam/not spam, cat/dog).For a 2-class problem where there is a single target variable(t) we can use the binary representation.

t = {0,1}

where 1 represents class C1 and 0 represents class C2. The function used is the sigmoid function.

In the case of probabilistic models they output probability values. In a two-class (binary) classification problem, where the goal is to decide between two possible labels (e.g., positive/negative, spam/not spam), a common practice is to use a threshold of 0.5 for the probability output by a probabilistic model. This means:

If the model outputs a probability greater than or equal to 0.5, the instance is classified into the positive class (or the first class).

If the model outputs a probability less than 0.5, the instance is classified into the negative class (or the second class).

However, the choice of threshold can be adjusted based on the specific requirements of the application, such as the relative importance of false positives versus false negatives. For example:

In a scenario where false negatives are more costly or dangerous than false positives (e.g., medical diagnosis where missing a disease could be fatal), the threshold might be set lower than 0.5 to make the model more sensitive to potential positive cases.

Conversely, in scenarios where false positives are more problematic (e.g., spam detection where wrongly filtering out important emails is undesirable), the threshold might be set higher than 0.5 to make the model more conservative in predicting the positive class.

Adjusting the threshold allows practitioners to tailor the model's performance to the specific needs of their application, optimizing for precision, recall, or other relevant metrics as required.

2. Multi-class Classification

Three or more categories (e.g., classifying handwritten digits 0-9, identifying different types of flowers). 1-of-k Coding Scheme or One Hot Encoding is used for Multi-class Classification.

Suppose there are n classes, C1, C2,  ... Cn, we create a target vector of size n. Each class is represented by the corresponding position in the vector. For class i, the i'th value will be 1 and the remaining values will be 0. 

Eg. if there are 3 classes {short,  medium, tall}, then we use vectors of size 3 to represent the target. The first position refers to short, second to medium and third to tall.

Target Vector for small: [1, 0, 0]

Target Vector for medium: [0, 1, 0]

Target Vector for tall: [0, 0, 1]

Again the value of t_k is used as probability to interpret the class. The function commonly used is the softmax function.

The following video can be viewed for a better understanding.













    

Comments

Popular posts from this blog

ANN Series - 10 - Backpropagation of Errors

Regression 10 - Pseudo-Inverse Matrix Approach - Relationship between MLE and LSA - Derivation

Regression 9 - Maximum Likelihood Estimation(MLE) Approach for Regression - Derivation