Generative AI - Variational Autoencoders



A Variational Autoencoder (VAE) is a type of deep learning model representing a significant advancement in unsupervised learning such as generative modeling, dimensionality reduction, and feature learning.

Unlike traditional autoencoders, the man idea behind VAEs is to use a probabilistic approach not only to reconstruct data but also to generate new data samples from a learned latent space. Read this chapter to understand Variational Autoencoders, how they differ from traditional autoencoders, and their unique loss function.

What are Variational Autoencoders?

Variational autoencoders are a type of neural network that extends the concept of traditional autoencoders by adding a probabilistic approach to it.

While the traditional autoencoders are designed to compress and regenerate the input data from latent space, VAEs, by using the probabilistic approach, can regenerate input data as well as generate new data samples by learning the underlying patterns in the input data. This ability of VAEs makes them very useful for tasks like making realistic images or creating new data points.

Traditional Autoencoders vs Variational Autoencoders

The table below provides a comprehensive comparison between traditional autoencoders and variational autoencoders −

Aspect Autoencoders Variational Autoencoders (VAEs)
Latent Space Autoencoders encode the input data into a deterministic point in the latent space. Variational Autoencoders encode the input data into a probability distribution in the latent space.
Encoder Output The encoder in autoencoders produces a single vector representation of the input. The encoder in VAEs produces two vectors-the mean and variance of the latent distribution.
Decoder Input The decoder in autoencoders takes the single vector from the encoder as input to regenerate the input data from latent space. The decoder in VAEs samples from the latent space using the mean and variance vectors as input.
Training Objective Autoencoders aim to minimize the reconstruction error between the input and the output. VAEs aim to minimize both the reconstruction error and the KL divergence between the learned and prior distributions.
Reconstruction Loss Autoencoders typically use Mean Squared Error (MSE) or Binary Cross-Entropy for reconstruction loss. VAEs also use Mean Squared Error (MSE) or Binary Cross-Entropy for reconstruction loss.
Regularization Autoencoders do not inherently include any regularization in the latent space. VAEs include a KL divergence term to regularize the latent space.
Generative Capability Autoencoders cannot generate new data samples from the input data. VAEs can generate new data samples similar to the input data.
Use of Prior Distribution Autoencoders do not use a prior distribution in the latent space. VAEs use a prior distribution, generally a standard normal distribution, in the latent space.
Complexity Autoencoders are easy to implement. VAEs are more complex due to the probabilistic components and the need for regularization.
Robustness to Overfitting Autoencoders can be prone to overfitting without proper regularization. VAEs are less prone to overfitting due to the regularizing effect of the KL divergence term.
Output Quality Autoencoders can accurately reconstruct input data. VAEs can generate new, realistic data samples.
Use Cases Autoencoders are used for dimensionality reduction, feature extraction, denoising, and anomaly detection. VAEs are used for generative modeling, data augmentation, semi-supervised learning, and image synthesis.

Variational Autoencoder Loss Function

The loss function of a variational autoencoder combines the following two components −

Reconstruction Loss

The reconstruction loss is used to make sure that the decoder can accurately reconstruct the input from the latent space representation received from hidden layer. It is calculated as the mean squared error (MSE) between the original input and reconstructed input. Mathematically, it is represented as follows −

$$\mathrm{\mathcal{L_{reconstruction}} \: = \: \displaystyle\sum\limits_{i=1}^N || x_{i} \: - \: \hat{x}_{l} ||^{2}}$$

Where $\mathrm{x_{i}}$ is the original input and $\mathrm{\hat{x}_{l}}$ is the reconstructed input.

KL Divergence

The KL divergence measures the deviation of the learned distribution from prior distribution. The prior distribution in VAE is generally a standard normal distribution. The term KL divergence regularizes the latent space representation and ensures it has properties that are useful for generative tasks.

Mathematically, it is represented as follows −

$$\mathrm{\mathcal{L}_{KL} \: = \: -\frac{1}{2} \displaystyle\sum\limits_{j=1}^d ( 1 \: + \: log(\sigma_{j}^{2}) \: - \: \mu_{j}^{2} \: - \: \sigma_{j}^{2})}$$

Where $\mathrm{\mu_{j}}$ is the mean and $\mathrm{\sigma_{j}}$ is the standard deviation of the latent distribution.

Total VAE Loss

The total loss function for training a VAE is the sum of the two key components i.e., the reconstruction loss and the KL divergence.

$$\mathrm{\mathcal{L}_{VAE} \: = \: \mathcal{L}_{reconstruction} \: + \: \mathcal{L}_{KL}}$$

This total loss makes sure that the model accurately reconstructs the input from latent space along with maintaining the generative task properties.

Conclusion

Using a probabilistic approach in the latent space makes variational autoencoders (VAEs) a powerful extension of the traditional autoencoders. This change allows VAEs to generate new, realistic data samples and make them very useful for various applications in the field of ML and data science.

In this chapter, we have discussed VAEs in detail, their loss function, and their comparison with traditional autoencoders. Understanding how VAEs differ from traditional autoencoders, and the role of VAE loss function is important to use these models effectively.

Advertisements