GAN4 Toward a Better Global Loss Landscape of GANs
Nengyu Wang (nengyuw2@illinois.edu)
Generative Adversarial Nets (GANs) (Goodfellow et al., 2016) are a successful method for various practical applications. Meanwhile, current theoretical studys of GANs are digging into the underlying mechanism in the aspects of statistics and optimization.
For statistics, we have Goodfellow et al., (2014) link the minmax formulation and the JS (Jenson-Shannon) distance. Also, Wasserstein Generative Adversarial Nets (WGANs) (Arjovsky et al., 2017) adapted the Wasserstein distance as the loss function. The generalization problems of GANs are also investigated to see how applicable the GANs methods are. On the optimization side, the works of cyclic behavior (Balduzzi et al., 2018) research the issues that the optimization algorithm may cycle around a stable point and converges slowly even diverges. Another challenge for optimization is to avoid the sub-optimal local minima. For GANs, current works (Mescheder et al., 2018) either analyze convex-concave games or perform local analysis without considering the global analysis. Even though some works conduct global analysis, it only works on a simple setting without further generalization.
Therefore, to fill in the gaps of theoretical analysis on GANs, the main goal of this work is to perform a global analysis on the GANs landscape for general data distribution. In the paper, the work is put in a table to compare with other theoretical works:
Therefore, in specific, this work is defined as a global analysis on GANs landscape by comparing the SepGAN and RpGAN.
Relativistic GANs
Before further discussion, let’s first introduce the work Relativistic GANs (Jolicoeur-Martineau, 2018). As we know, the orginal GANs suffer from the problems of instable training and mode collapse so some works take efforts to solving the problems either by imposing regularizations or changing the loss. In this paper, the concepts of relativity are emphersized to suggest that the discriminator requires the relative probability of real images to help training.
The above table is comparing how the probability of real images change for standard GAN and relativistic GAN. Given the images of bread as real images, we have three situations:
- The real images are bread and the fake images are dogs. Then, the absolute and relative probability of real images being bread is one.
- The real images are bread and the fake images are dogs that look similar to bread. Then, the absolute probability of real images being bread is still one and the relative probability decreases.
- The real images are bread in the shape of dogs and fake images are dogs. Then the probability of real images being bread is low while the relative probability increases.
In summary, as the discriminator scores the reality of samples, both real data and fake data should be taken into account so the judgement of absolute real or fake is changed to the probability of being real or fake. This point is illustrated by two aspects:
- Utilizing the prior knowledge that the input samples consist of half real and half fake images, the discriminator can score real samples low instead of considering all samples real as fake samples are more realistic than real samples.
- As the following figure shows, we often encourage the generator to generate more realistic images so pushing the discriminator to give high scores to fake images but the real samples are ignored. Instead, an ideal training should involve decreasing the score of real samples as the fake samples become more and more real.
Therefore, to make the discriminator judge the samples relatively, the loss objective is modified as following:
where pairs of real and fake samples are used for computing loss.
Landscape Analysis of GANs
The paper use a simple case to illustrate the phenomena of bad local minima in GANs.
Given real samples and , we need to generate fake samples and that maximially match the real samples. As the discriminator update, a boundary that classifies real and fake samples is set between the margins of two sample sets. Then, by updating the generator, the generated samples are forced to approach the real samples. As the process goes on and on, the generated samples will be trapped in a cluster around one real sample, which is so called mode collapse.
For the relativistic GAN (RpGAN), since we compare samples in each pair instead of sample sets, each generated sample can be pushed to different real sample, therefore, relieving the mode collapse problem in standard GANs.
Bad Local Minima
To further investigate how pairing influence local minima, the paper sets a two-point case:
Given two real samples and , and two fake samples and , we have four states that represent , , , .
Representing these states by the divergence function we have the following:
where is the divergence measurement in JS-GAN. This representation can be transformed to an continuous curve:
We can see that the landscape of JS-GAN has a local minima at the state and a global minima at state , meaning that as all fake samples are at the cluster of one real sample, they are trapped in the local minima, which corresponds to the previous illustration.
Then, for the , we have:
and
Therefore, we can see the RS-GAN only has a global minima at .
Landscape Results in Function Space
Theorem 1
Suppose are distinct. Suppose satisfy Assumptions 4.1, and 4.3. Then for separable-GAN loss defined in Eq. (5), we have: \(i) The global minimal value is , which is achieved iff \(ii) If and for some , then is a sub-optimal strict local minimum. Therefore, has sub-optimal strict local minima.
This theorem generalizes the results in the case of to any n and is trying to show two results: first, for standard GAN, global minimal achieves when two sets of points perfectly match; second, local minima always exists as the assumptions and the second restriction in the theorem are satisfied. Since the second part of the theorem is easily to satisfy, the local minima cannot be avoided.
Definition (global-min-reachable)
We say a point is global-min-reachable for a function if there exists a continuous path from to one global minimum of along which the value of is non-increasing.
The definition defines any points on an non-increasing curve as global-min-reachable. Then we have the theorem 2.
Theorem 2
Suppose are distinct. Suppose h satisfies Assumptions and Then for RpGAN loss defined in Eq. (6): (i) The global minimal value is , which is achieved iff (ii) Any is global-min-reachable for the function
The main point of this theorm is that any points for the function is on an non-increaseing curve, which means there is no local minima for the loss.
Hence, these two theorems generalize the conclusion of two-point case to all other cases to show the global landscape, which is saying that for standard GANs, local sub-optimal and it’s consequence like mode collapse is hard to avoid, while the RpGANs have none local minimal, which, therefore, avoids the mode collapse and other issues.
Results
This figure is showing how the distribution of generated samples move from the initial state where fake samples are trapped in one real samples cluster. Given that red points are generated samples and blue points are real samples, It’s obvious that for RS-GAN, the generated samples move faster away from the trap of real sample cluster or equally, the local minimal. This results firmly support the theoretical analysis.
The above figure is showing how the discriminator loss changes as the training progresses. Compared to the loss of JS-GAN, the loss of RS-GAN converges much faster. Another observation is that the loss of JS-GAN stuck at 0.48 for a while, which is close to the value of at state so, again, it support the previous analysis.
Reference
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NeurIPS, 2014.
M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. In ICML, 2017.
D. Balduzzi, S. Racaniere, J. Martens, J. Foerster, K. Tuyls, and T. Graepel. The mechanics of n-player differentiable games. arXiv preprint arXiv:1802.05642, 2018.
L. Mescheder, A. Geiger, and S. Nowozin. Which training methods for gans do actually converge? In ICML, 2018.
A. Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. In ICLR, 2018.