
In this post, we’ll briefly talk about some deep learning models.
1. Multilayer Perceptron (MLP)
A multilayer perceptron is a neural network composed of one or more fully connected hidden layers. Its learning capacity depends on two factors:
– Width (number of neurons per layer), which affects the ability to capture a broad range of features.
– Depth (number of layers), which enables hierarchical representation learning.
Interestingly, even a single hidden layer MLP can approximate any continuous function (known as the Universal Approximation Theorem).
However, MLPs require input data to be in a structured, one-dimensional format (like tabular data). Because of this, MLPs are less effective for unstructured data such as images, text, or audio. Still, with appropriate feature extraction or preprocessing, they can be adapted to work in these domains.
2. Convolutional Neural Network (CNN)
Convolutional Neural Networks are designed to exploit the spatial structure of data, making them particularly powerful for image, video, and audio processing. CNNs use layers of convolutions and pooling to extract features hierarchically:
– Early layers capture simple patterns (edges, textures).
– Deeper layers capture complex structures (shapes, objects).
The extracted features are eventually flattened into a one-dimensional vector and passed through fully connected layers for prediction.
CNN architectures vary widely, and their depth and design directly influence performance.
3. Recurrent Neural Network (RNN)
Recurrent Neural Networks excel at handling sequential data such as text, time series, or speech. They achieve this by using recurrent connections that allow the model to retain an internal state, effectively giving it a form of memory. This memory is parameterized by weights that are shared across the sequence, enabling the model to capture temporal dependencies.
However, vanilla RNNs suffer from the vanishing gradient problem, which makes it difficult for them to learn long-term dependencies. To address this, advanced variants such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks were developed.
– LSTM introduces memory cells and gating mechanisms to decide what information to retain or discard.
– GRU provides a simplified version of LSTM with fewer parameters while maintaining strong performance.
These architectures have made RNNs highly successful in applications like machine translation, image captioning, and sentiment analysis.
4. Transformer Networks
Transformer architectures have revolutionized AI systems, empowering large language models (LLMs), image generation systems, and multimodal reasoning. They introduced a parallelizable and scalable solution to deal with long-range dependencies in sequential computation, without bottlenecks.
The secret sauce of transformers is a key innovation of self-attention mechanism. This allows each token in a sequence to directly consider the importance of every other word when forming its representation. Moreover, Multi-Head Attention computations learn different relationships in parallel, capturing nuanced dependencies.
A transformer is built from stacks of encoders and decoders or sometimes only encoders/decoders. Often times skip connections and layer normalization are used after each sublayer.
Encoder layers read the entire input sequences and build contextual representations. Whereas, decoders generate output tokens step by step, attending to both past outputs and encoder outputs. Only decoders (such as GPTs) can be great for text generation.
The biggest advantage of transformers is scalability. They are highly parallelizable and hence GPUs and TPUs can be utilized to process entire sequences at once. Without the need of sequential processing, the transformer architectures enabled training on massive datasets and resulted in today’s widely used LLMs.
Transformers are now universal across domains – ViTs apply attention to images, while multimodal models handle text, image and audio together.
Wrapping Up
Supervised deep learning models like MLPs, RNNs, and CNNs are now widely applied in fields such as medical imaging, IoT, natural language processing, and robotics. And transformer architectures are able to effectively capture global relationships in today’s advanced AI systems.
In the next post, we’ll dive deeper into the fascinating world of CNN architectures—exploring why they’re so effective.

Leave a comment