Convolutional Neural Networks (CNNs)
Introduction
- CNNs are a specialized type of neural network mainly designed for image processing and classification.
- They work by recognizing edges, textures, shapes, and patterns in images.
- Digital images are represented as matrices of pixel values (0–255).
- Images can be:
- Grayscale: Single channel (intensity values).
- RGB (Red, Green, Blue): Three channels, each represented by a separate matrix.
Neurons
- A neuron is the most basic unit in a neural network.
- It applies a linear function (weighted sum + bias), followed by a non-linear activation function.
Mathematical Representation of a Neuron:
Where:
- = input
- = weight
- = bias
- = weighted sum
- = activation function
Common Activation Functions
- ReLU (Rectified Linear Unit) – keeps positive values, converts negative values to 0.
- Leaky ReLU – allows a small gradient for negative values (avoids dead neurons).
- Sigmoid – outputs between 0 and 1, often used for probabilities.
- Tanh (Hyperbolic Tangent) – outputs between -1 and 1.
Convolutional Layer
- The core layer in CNNs, responsible for extracting features.
- Uses kernels/filters:
- A small matrix (e.g., 3×3 or 5×5).
- Slides over the input image, performing element-wise multiplication.
- Produces feature maps that highlight edges, textures, and shapes.
- Multiple feature maps can be generated for the same input image by using different kernel values.
- Stride: Number of pixels the kernel moves at each step.
- Large stride → reduces output size but may miss details.
- Small stride → captures more detail but increases computation (risk of overfitting).
- Padding: Adding extra rows/columns around the image.
- Helps preserve the spatial size after convolution.
- Formula:
Pooling Layer
- Reduces the spatial size of the feature maps (down-sampling).
- Makes computation faster, reduces memory usage, and prevents overfitting.
- Types of pooling:
- Max Pooling – takes the maximum value from the region.
- Average Pooling – takes the average value from the region.
- Pooling reduces size without losing important features.
- If important features are lost, pooling should be avoided.
Flattening Layer
- Converts the 2D feature maps into a 1D column vector.
- This vector becomes input for the fully connected layers.
Fully Connected (FC) Layer
- Each neuron is connected to every neuron in the previous layer.
- Responsible for combining features into final classification or regression results.
Output Layer
- Applies activation functions (like Sigmoid or Softmax) to generate probabilities.
- For classification, it outputs the probability score for each class.
Applications of CNNs
- Image classification (e.g., cats vs. dogs).
- Object detection and segmentation.
- Autonomous vehicles.
- Security camera systems.
Summary of CNN Architecture
- Input Layer – Image data (height × width × depth).
- Convolutional Layer – Extracts features using kernels/filters.
- Activation Layer – Applies non-linearity (ReLU, Tanh, etc.).
- Pooling Layer – Reduces dimensions (Max/Average pooling).
- Flattening Layer – Converts 2D maps into a 1D vector.
- Fully Connected Layer – Learns complex patterns for classification.
- Output Layer – Produces class probabilities.

