‘AI In A Flash’ – 4. Convolutional Neural Nets

AlexNet Introduced with five convolutional layers (with max pooling after the first and second), followed by three fully connected layers. ReLU activations helped mitigate vanishing gradients. AlexNet was the first deep learning model to demonstrate CNNs’ capability for large-scale image recognition. ZFNet Similar in structure to AlexNet, with five convolutional and three fully connected layers.…

AlexNet

Introduced with five convolutional layers (with max pooling after the first and second), followed by three fully connected layers. ReLU activations helped mitigate vanishing gradients. AlexNet was the first deep learning model to demonstrate CNNs’ capability for large-scale image recognition.

ZFNet

Similar in structure to AlexNet, with five convolutional and three fully connected layers. However, smaller filter sizes, reduced stride, and contrast normalization improved feature extraction and overall performance.

Network-in-Network (NiN)

Enhanced performance by replacing traditional convolution layers with micro-networks: a k×k convolution followed by two 1×1 convolutions. These pointwise convolutions approximated more abstract feature representations. The final feature maps were fed directly to a softmax classifier.

VGGNet

A deeper architecture (up to 19 convolutional layers), using small 3×3 filters and ReLU activations. Max pooling was applied after each block of two or three convolutions. While highly effective in capturing discriminative features, its ~138M parameters made it computationally expensive.

GoogLeNet (Inception v1)

Introduced the Inception module, which combined multiple convolutional paths (with varying filter sizes) to capture information at multiple scales. Stacking inception modules improved efficiency, with occasional max pooling for downsampling. Later versions integrated batch normalization, auxiliary classifiers, and deeper variants.

Highway Networks

Addressed vanishing gradients in very deep networks (~100 layers) using gating mechanisms to regulate information flow. However, this came at the cost of significantly more parameters.

ResNet

Solved vanishing gradients with residual (skip) connections, allowing input signals to bypass certain layers. Residual blocks typically contained 2–3 convolutions with batch normalization and ReLU. This enabled successful training of networks up to 152 layers deep.

DenseNet

Used dense connections, where each layer received inputs from all preceding layers within a block. With batch normalization and ReLU, plus transition layers for downsampling, DenseNet achieved extreme depth (up to 264 layers) with improved feature reuse and efficiency.

WideResNet

A ResNet variant that widened layers (increasing channels) instead of just stacking deeper. This promoted greater feature diversity, improving recognition of complex patterns while reducing diminishing feature reuse.

ResNeXt

Extended ResNet by introducing cardinality—the number of parallel transformation paths. Each path extracted features independently, and outputs were aggregated for richer representations. Skip connections remained, though increased cardinality led to higher memory and compute demands.

Note – CNN models are typically benchmarked on large-scale datasets such as ImageNet and CIFAR.

Leave a comment