Convolutional neural network (CNN)

PUBLISHED: MAY 2, 20263 MIN READ

Convolutional Neural Networks (CNNs)IntroductionCNNs are a specialized type of neural network mainly designed for image processing and classification.They work

Ashutosh Kumar
Ashutosh KumarAuthor
masquerain
#284
masquerain

Convolutional Neural Networks (CNNs)

Introduction

  • CNNs are a specialized type of neural network mainly designed for image processing and classification.
  • They work by recognizing edges, textures, shapes, and patterns in images.
  • Digital images are represented as matrices of pixel values (0–255).
  • Images can be:
    • Grayscale: Single channel (intensity values).
    • RGB (Red, Green, Blue): Three channels, each represented by a separate matrix.

Neurons

  • A neuron is the most basic unit in a neural network.
  • It applies a linear function (weighted sum + bias), followed by a non-linear activation function.

Mathematical Representation of a Neuron:

x        z=wx+b        σ(z)x\;\;\longrightarrow\;\;z=wx+b\;\;\longrightarrow\;\;\sigma(z)

Where:

  • = input
  • = weight
  • = bias
  • = weighted sum
  • = activation function

Common Activation Functions

  • ReLU (Rectified Linear Unit) – keeps positive values, converts negative values to 0.
  • Leaky ReLU – allows a small gradient for negative values (avoids dead neurons).
  • Sigmoid – outputs between 0 and 1, often used for probabilities.
  • Tanh (Hyperbolic Tangent) – outputs between -1 and 1.

Convolutional Layer

  • The core layer in CNNs, responsible for extracting features.
  • Uses kernels/filters:
    • A small matrix (e.g., 3×3 or 5×5).
    • Slides over the input image, performing element-wise multiplication.
    • Produces feature maps that highlight edges, textures, and shapes.
  • Multiple feature maps can be generated for the same input image by using different kernel values.
  • Stride: Number of pixels the kernel moves at each step.
    • Large stride → reduces output size but may miss details.
    • Small stride → captures more detail but increases computation (risk of overfitting).
  • Padding: Adding extra rows/columns around the image.
    • Helps preserve the spatial size after convolution.
    • Formula:
Output Size=Input Size+2×PaddingKernel SizeStride+1\text{Output Size}=\frac{\text{Input Size} + 2 \times\text{Padding} - \text{Kernel Size}}{\text{Stride}}+1

Pooling Layer

  • Reduces the spatial size of the feature maps (down-sampling).
  • Makes computation faster, reduces memory usage, and prevents overfitting.
  • Types of pooling:
    • Max Pooling – takes the maximum value from the region.
    • Average Pooling – takes the average value from the region.
  • Pooling reduces size without losing important features.
  • If important features are lost, pooling should be avoided.

Flattening Layer

  • Converts the 2D feature maps into a 1D column vector.
  • This vector becomes input for the fully connected layers.

Fully Connected (FC) Layer

  • Each neuron is connected to every neuron in the previous layer.
  • Responsible for combining features into final classification or regression results.

Output Layer

  • Applies activation functions (like Sigmoid or Softmax) to generate probabilities.
  • For classification, it outputs the probability score for each class.

Applications of CNNs

  • Image classification (e.g., cats vs. dogs).
  • Object detection and segmentation.
  • Autonomous vehicles.
  • Security camera systems.

Summary of CNN Architecture

  1. Input Layer – Image data (height × width × depth).
  2. Convolutional Layer – Extracts features using kernels/filters.
  3. Activation Layer – Applies non-linearity (ReLU, Tanh, etc.).
  4. Pooling Layer – Reduces dimensions (Max/Average pooling).
  5. Flattening Layer – Converts 2D maps into a 1D vector.
  6. Fully Connected Layer – Learns complex patterns for classification.
  7. Output Layer – Produces class probabilities.