[AVB] Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks

Generative Model/Generative Model 2024. 5. 18. 12:31

Abstract

Variational Autoencoders (VAEs) are expressive latent variable models that can be used to learn complex probability distributions from training data. However, the quality of the resulting model crucially relies on the expressiveness of the inference model. We introduce Adversarial Variational Bayes (AVB), a technique for training Variational Autoencoders with arbitrarily expressive inference models. We achieve this by introducing an auxiliary discriminative network that allows to rephrase the maximum-likelihood-problem as a two-player game, hence establishing a principled connection between VAEs and Generative Adversarial Networks (GANs). We show that in the nonparametric limit our method yields an exact maximum-likelihood assignment for the parameters of the generative model, as well as the exact posterior distribution over the latent variables given an observation. Contrary to competing approaches which combine VAEs with GANs, our approach has a clear theoretical justification, retains most advantages of standard Variational Autoencoders and is easy to implement.

1. Introduction

Generative models in machine learning are models that can be trained on an unlabeled dataset and are capable of generating new data points after training is completed. As generating new content requires a good understanding of the training data at hand, such models are often regarded as a key ingredient to unsupervised learning.

In recent years, generative models have become more and more powerful. While many model classes such as PixelRNNs (van den Oord et al., 2016b), PixelCNNs (van den Oord et al., 2016a), real NVP (Dinh et al., 2016) and Plug & Play generative networks (Nguyen et al., 2016) have been introduced and studied, the two most prominent ones are Variational Autoencoders (VAEs) (Kingma & Welling, 2013; Rezende et al., 2014) and Generative Adversarial Networks (GANs) (Goodfellow et al., 2014).

Both VAEs and GANs come with their own advantages and disadvantages: while GANs generally yield visually sharper results when applied to learning a representation of natural images, VAEs are attractive because they naturally yield both a generative model and an inference model. Moreover, it was reported, that VAEs often lead to better log-likelihoods (Wu et al., 2016). The recently introduced BiGANs (Donahue et al., 2016; Dumoulin et al., 2016) add an inference model to GANs. However, it was observed that the reconstruction results often only vaguely resemble the input and often do so only semantically and not in terms of pixel values.

The failure of VAEs to generate sharp images is often attributed to the fact that the inference models used during training are usually not expressive enough to capture the true posterior distribution. Indeed, recent work shows that using more expressive model classes can lead to substantially better results (Kingma et al., 2016), both visually and in terms of log-likelihood bounds. Recent work (Chen et al., 2016) also suggests that highly expressive inference models are essential in presence of a strong decoder to allow the model to make use of the latent space at all.

In this paper, we present Adversarial Variational Bayes (AVB), a technique for training Variational Autoencoders with arbitrarily flexible inference models parameterized by neural networks. We can show that in the nonparametric limit we obtain a maximum-likelihood assignment for the generative model together with the correct posterior distribution.

While there were some attempts at combining VAEs and GANs (Makhzani et al., 2015; Larsen et al., 2015), most of these attempts are not motivated from a maximumlikelihood point of view and therefore usually do not lead to maximum-likelihood assignments. For example, in Adversarial Autoencoders (AAEs) (Makhzani et al., 2015) the Kullback-Leibler regularization term that appears in the training objective for VAEs is replaced with an adversarial loss that encourages the aggregated posterior to be close to the prior over the latent variables. Even though AAEs do not maximize a lower bound to the maximum-likelihood objective, we show in Section 6.2 that AAEs can be interpreted as an approximation to our approach, thereby establishing a connection of AAEs to maximum-likelihood learning.

Outside the context of generative models, AVB yields a new method for performing Variational Bayes (VB) with neural samplers. This is illustrated in Figure 1, where we used AVB to train a neural network to sample from a nontrival unnormalized probability density. This allows to accurately approximate the posterior distribution of a probabilistic model, e.g. for Bayesian parameter estimation. The only other variational methods we are aware of that can deal with such expressive inference models are based on Stein Discrepancy (Ranganath et al., 2016; Liu & Feng, 2016). However, those methods usually do not directly target the reverse Kullback-Leibler-Divergence and can therefore not be used to approximate the variational lower bound for learning a latent variable model.

Our contributions are as follows:

We enable the usage of arbitrarily complex inference models for Variational Autoencoders using adversarial training.
We give theretical insights into our method, showing that in the nonparametric limit our method recovers the true posterior distribution as well as a true maximum-likelihood assignment for the parameters of the generative model.
We empirically demonstrate that our model is able to learn rich posterior distributions and show that the model is able to generate compelling samples for complex data sets.

2. Background

As our model is an extension of Variational Autoencoders (VAEs) (Kingma & Welling, 2013; Rezende et al., 2014), we start with a brief review of VAEs.

VAEs are specified by a parametric generative model pθ(x | z) of the visible variables given the latent variables, a prior p(z) over the latent variables and an approximate inference model qφ(z | x) over the latent variables given the visible variables. It can be shown that

The right hand side of (2.1) is called the variational lower bound or evidence lower bound (ELBO). If there is φ such that qφ(z | x) = pθ(z | x), we would have

However, in general this is not true, so that we only have an inequality in (2.2).

When performing maximum-likelihood training, our goal is to optimize the marginal log-likelihood

where pD is the data distribution. Unfortunately, computing log pθ(x) requires marginalizing out z in pθ(x, z) which is usually intractable. Variational Bayes uses inequality (2.1) to rephrase the intractable problem of optimizing (2.3) into

Due to inequality (2.1), we still optimize a lower bound to the true maximum-likelihood objective (2.3).

Naturally, the quality of this lower bound depends on the expressiveness of the inference model qφ(z | x). Usually, qφ(z | x) is taken to be a Gaussian distribution with diagonal covariance matrix whose mean and variance vectors are parameterized by neural networks with x as input (Kingma & Welling, 2013; Rezende et al., 2014). While this model is very flexible in its dependence on x, its dependence on z is very restrictive, potentially limiting the quality of the resulting generative model. Indeed, it was observed that applying standard Variational Autoencoders to natural images often results in blurry images (Larsen et al., 2015).

3. Method

In this work we show how we can instead use a black-box inference model qφ(z | x) and use adversarial training to obtain an approximate maximum likelihood assignment θ* to θ and a close approximation qφ*(z | x) to the true posterior pθ*(z | x). This is visualized in Figure 2: on the left hand side the structure of a typical VAE is shown. The right hand side shows our flexible black-box inference model. In contrast to a VAE with Gaussian inference model, we include the noise 1 as additional input to the inference model instead of adding it at the very end, thereby allowing the inference network to learn complex probability distributions.

3.1. Derivation

To derive our method, we rewrite the optimization problem in (2.4) as

When we have an explicit representation of qφ(z | x) such as a Gaussian parameterized by a neural network, (3.1) can be optimized using the reparameterization trick (Kingma & Welling, 2013; Rezende & Mohamed, 2015) and stochastic gradient descent. Unfortunately, this is not the case when we define qφ(z | x) by a black-box procedure as illustrated in Figure 2b.

The idea of our approach is to circumvent this problem by implicitly representing the term

as the optimal value of an additional real-valued discriminative network T(x, z) that we introduce to the problem.

More specifically, consider the following objective for the discriminator T(x, z) for a given qφ(x | z):

Here, σ(t) := (1 + e−t ) −1 denotes the sigmoid-function. Intuitively, T(x, z) tries to distinguish pairs(x, z) that were sampled independently using the distribution pD(x)p(z) from those that were sampled using the current inference model, i.e., using pD(x)qφ(z | x).

To simplify the theoretical analysis, we assume that the model T(x, z) is flexible enough to represent any function of the two variables x and z. This assumption is often referred to as the nonparametric limit (Goodfellow et al., 2014) and is justified by the fact that deep neural networks are universal function approximators (Hornik et al., 1989).

As it turns out, the optimal discriminator T ∗ (x, z) according to the objective in (3.3) is given by the negative of (3.2).

Proposition 1. For pθ(x | z) and qφ(z | x) fixed, the optimal discriminator T ∗ according to the objective in (3.3) is given by

4. Adaptive Contrast

7. Conclusion

We presented a new training procedure for Variational Autoencoders based on adversarial training. This allows us to make the inference model much more flexible, effectively allowing it to represent almost any family of conditional distributions over the latent variables.

We believe that further progress can be made by investigating the class of neural network architectures used for the adversary and the encoder and decoder networks as well as finding better contrasting distributions.

Proofs

'Generative Model > Generative Model' 카테고리의 다른 글

Conditional Generative Adversarial Nets (0)	2024.05.18
[Flow-GAN] Combining Maximum Likelihood and Adversarial Learning in Generative Models (0)	2024.05.18
[AAE] Adversasrial Autoencoders (0)	2024.05.18
[VAE-GAN] Autoencoding beyond pixels using a learned similarity metric (0)	2024.05.18
Normalizing Flows Tutorial, Part 2: Modern Normalizing Flows (0)	2024.05.16

ABOUT ME

밤에 쓰는 편지 밤에 쓰는 편지

Abstract

1. Introduction

2. Background

3. Method

3.1. Derivation

4. Adaptive Contrast

7. Conclusion

Proofs

'Generative Model > Generative Model' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Abstract

1. Introduction

2. Background

3. Method

3.1. Derivation

4. Adaptive Contrast

7. Conclusion

Proofs

'Generative Model > Generative Model' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바