Convolutional Neural Networks (CNNs)

CNNs are a specialized type of neural network mainly designed for image processing and classification.
They work by recognizing edges, textures, shapes, and patterns in images.
Digital images are represented as matrices of pixel values (0–255).
Images can be:
- Grayscale: Single channel (intensity values).
- RGB (Red, Green, Blue): Three channels, each represented by a separate matrix.

A neuron is the most basic unit in a neural network.
It applies a linear function (weighted sum + bias), followed by a non-linear activation function.

Mathematical Representation of a Neuron:

x\;\;\longrightarrow\;\;z=wx+b\;\;\longrightarrow\;\;\sigma(z)

Where:

ReLU (Rectified Linear Unit) – keeps positive values, converts negative values to 0.
Leaky ReLU – allows a small gradient for negative values (avoids dead neurons).
Sigmoid – outputs between 0 and 1, often used for probabilities.
Tanh (Hyperbolic Tangent) – outputs between -1 and 1.

The core layer in CNNs, responsible for extracting features.
Uses kernels/filters:
- A small matrix (e.g., 3×3 or 5×5).
- Slides over the input image, performing element-wise multiplication.
- Produces feature maps that highlight edges, textures, and shapes.
Multiple feature maps can be generated for the same input image by using different kernel values.
Stride: Number of pixels the kernel moves at each step.
- Large stride → reduces output size but may miss details.
- Small stride → captures more detail but increases computation (risk of overfitting).
Padding: Adding extra rows/columns around the image.
- Helps preserve the spatial size after convolution.
- Formula:

\text{Output Size}=\frac{\text{Input Size} + 2 \times\text{Padding} - \text{Kernel Size}}{\text{Stride}}+1

Reduces the spatial size of the feature maps (down-sampling).
Makes computation faster, reduces memory usage, and prevents overfitting.
Types of pooling:
- Max Pooling – takes the maximum value from the region.
- Average Pooling – takes the average value from the region.
Pooling reduces size without losing important features.
If important features are lost, pooling should be avoided.

Each neuron is connected to every neuron in the previous layer.
Responsible for combining features into final classification or regression results.

Applies activation functions (like Sigmoid or Softmax) to generate probabilities.
For classification, it outputs the probability score for each class.