Posts

Showing posts from February, 2024

Bayesian Networks

Image
A Bayesian network, also known as a belief network, probabilistic directed acyclic graphical model, or Bayes net , is a statistical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG) .  Key components and concepts: 1. Nodes:  Each node in the graph represents a random variable. These variables can be observable quantities, latent variables, unknown parameters, or hypotheses. 2. Edges:  The edges between the nodes represent conditional dependencies; an edge from node A to node B indicates that B is dependent on A. The absence of an edge indicates conditional independence between variables. 3. Conditional Probability Tables (CPTs):  Each node is associated with a probability function that takes a particular set of values for the node's parent variables and gives the probability of the variable represented by the node. For nodes without parents, the CPT reduces to the prior probability of the node. 4. Joint Probability Distr

Naive Bayesian Classifiers - Multinomial, Bernoulli and Gaussian with Solved Examples and Laplace Smoothing

Image
 Learn about the Naive Bayes Classifier in the following notes. Numeric Example with Dataset (Transactional Data) Sum1 Sum2 Consider the following dataset. Apply the Naïve Bayes classifier to the following frequency table and predict the type of fruit given it is {Yellow, Sweet, Long}. Solution can be viewed in the following pdf. Numeric Example with Text Data Multinomial Naive Bayes vs Bernoulli Naive Bayes Multinomial Naive Bayes and Bernoulli Naive Bayes are both variations of the Naive Bayes algorithm, and they are used for different types of data distributions: 1. Multinomial Naive Bayes:    - The Multinomial Naive Bayes classifier is used for data that is multinomially distributed, which typically means it is used for discrete data.    - It is particularly suitable for text classification problems where features (or words) can occur multiple times. For example, it can be used for document classification where the features are the frequencies of the words or n-grams

Classification 2 - Classification Models

Classification Type 1 Decision Trees - CART (Classification and Regression Trees) - C4.5 - C5.0 Ensemble Methods - Random Forests - Gradient Boosting Machines (GBM) - AdaBoost (Adaptive Boosting) Support Vector Machines (Maximum Margin Classifiers) - Linear SVM - Kernel SVM Neural Networks and Deep Learning - Multilayer Perceptrons (MLP) - Convolutional Neural Networks (CNN) - Recurrent Neural Networks (RNN) - Long Short-Term Memory Networks (LSTM) Probabilistic Models - Naive Bayes - Bayesian Networks - Gaussian Processes Instance-Based Learning - K-Nearest Neighbors (KNN) Classification Type 2 Linear Models for Classification :  Predict the class label based on a linear combination of input features. - Logistic Regression - Linear Discriminant Analysis (LDA) - Perceptron   Probabilistic Generative Models :  Learn the joint probability distribution of inputs and labels, allowing for the generation of new data samples. - Naive Bayes - Gaussian Mixture Models  Probabilistic

Classification 1 - Introduction

Image
What is Classification? Input : Vector x Output : Class  Note : There can be k distinct classes Assumption : Classes are disjoint. That is, no 2 samples can have the same class. Goal : To find the decision boundary between classes Decision Region The input space with the divisions for various classes Decision Boundary or Decision Surface These are the boundaries that divide the input space. In the above example, the input space is 2-Dimensional and the decision boundary is the line(purple). The decision boundary is always one dimension less than the input space dimension. That is, if the input space is of D-dimensions, the decision boundary will be of dimension D-1. What are Classification Models? Imagine sorting emails into "spam" and "inbox" or categorizing images as "cat" or "dog." These are classification tasks, where we want to predict a discrete label (category) for a given data point. Classification models are algorithms trained to perfor

ANN Series 12 - Internal Covariate Shift

Internal Covariate Shift refers to the phenomenon in deep learning where the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This shift in the distribution of inputs can slow down the training process because each layer needs to continuously adapt to the new distribution. When the input distribution to a layer changes, it may require readjustment of the learning rate or more careful initialization of weights to maintain training stability. This issue becomes more pronounced in deep networks with many layers, as even small changes can amplify through the network.  Causes of Internal Covariate Shift - Parameter Updates:  During the training process, as we update the weights of the network through backpropagation, the statistical properties of each layer's input change. This is because the output of one layer, which serves as the input to the next layer, is affected by the weight adjustments. - Non-linear Activati

ANN Series 11 - Regularization in Neural Networks

Image
Overfitting Regularization in neural networks is a crucial technique used to prevent overfitting , which occurs when a model learns the training data too well, including its noise and outliers, leading to poor performance on unseen data. Overfitting happens especially when the network is too complex relative to the amount and variety of the training data.  Regularization techniques modify the learning process to reduce the complexity of the model , encouraging it to learn more general patterns that can generalize better to new, unseen data.  Common Techniques Here are some common regularization techniques used in neural networks: 1. L1 Regularization (Lasso Regression):  Adds a penalty equal to the absolute value of the magnitude of coefficients. This can lead to sparse models where some weights become exactly zero, effectively removing some features/weights. Lasso can struggle in situations where the number of predictors is much larger than the number of observations or when several p

ANN Series - 10 - Backpropagation of Errors

Image
Watch the video to understand the forward pass in an ANN. Backpropagation, short for "backward propagation of errors," is a fundamental algorithm used for training artificial neural networks. It efficiently computes the gradient of the loss function with respect to the weights of the network, which is essential for adjusting the weights and minimizing the loss through gradient descent or other optimization techniques. The process involves two main phases: a forward pass and a backward pass.     Forward Pass   1. Input Layer:  The input features are fed into the network. 2. Hidden Layers:  Each neuron in these layers computes a weighted sum of its inputs (from the previous layer or the input layer) plus a bias term. This sum is then passed through an activation function to produce the neuron's output. This process repeats layer by layer until the output layer is reached. 3. Output Layer:  The final output of the network is computed, which is then used to calculate the loss

ANN Series 9 - Multilayer Perceptron(MLP) and Multilayer Artificial Neural Network(Multi-layered ANN)

Image
The terms "Multilayer Perceptron" (MLP) and "Multilayer Artificial Neural Network" (Multi-layered ANN) are often used interchangeably to describe a specific type of neural network architecture. Both refer to a class of artificial neural networks that contain one input layer, one or more hidden layers, and one output layer.  The key characteristics that define MLPs and multi-layered ANNs include: 1. Multiple Layers:  Both MLPs and multi-layered ANNs are characterized by having multiple layers of neurons (also known as nodes). These include the input layer that receives the data, one or more hidden layers that process the data, and the output layer that produces the final prediction or classification. 2. Non-linear Activation Functions:  In both architectures, neurons in the hidden layers and sometimes in the output layer apply non-linear activation functions to their inputs. These functions enable the network to learn complex patterns and relationships in the data th

ANN Series 8 - Perceptron

Image
In the previous blogs in the series we saw how to handle regression with a simple ANN architecture. In this blog, we perform classification with a single layered ANN for binary classification and it is termed perceptron. To update weights we make use of the Perceptron learning rule rather than Gradient Descent that was used in regression. Perceptron Binary classifier : Class = {1,-1} Linear Classifier Learns weight and threshold Activation Function – Step function –Binary or Sign Let input vector x = ( x 1 , x 2 , ... , x n ) Associate weight vector for the input w = ( w 1 , w 2 , ... , w n )   Perceptron Learning Algorithm Step 1 : Initialize weights randomly Step 2 : Take a training sample at random without replacement Step 3 : Perform Affine transformation of input and weights Step 4 : Pass output of affine transformation through activation function to find the class label Step 5 : Update weights