Mode collapse, a failure case for GANs where the generator generate a limited diversity of samples, regardless of the input.

But what causes the mode collapse? There are four reasons for that.

The objective of GANs

The generator, generates new data, while the discriminator evaluates it for authenticity but not for the diversity of generated instances.

the generator can win by producing a polynomial number of training examples. And a low capacity discriminator cannot detect this process, thus, it cannot guide the generator to approximate the target distribution. Even if a high discriminator identifies and assigns the collapse part a low probability, then the generator will simply move from its collapsed output to focus on another fixed output point.

Generator

No matter the objective function is, if it only considers individual samples (without looking forward or backward) then the generator is not directly incentivised to produce diverse examples.

From [1], standard GAN training corresponds exactly to updating the generator parameters using only the first term in this gradient because of a fixed discriminator during GAN training. Therefore, in standard GAN training, each generator update step is a partial collapse towards a delta function.

$$ \frac { \mathrm { d } f _ { K } \left( \theta _ { G } , \theta _ { D } \right) } { \mathrm { d } \theta _ { G } } = \frac { \partial f \left( \theta _ { G } , \theta _ { D } ^ { K } \left( \theta _ { G } , \theta _ { D } \right) \right) } { \partial \theta _ { G } } + \frac { \partial f \left( \theta _ { G } , \theta _ { D } ^ { K } \left( \theta _ { G } , \theta _ { D } \right) \right) } { \partial \theta _ { D } ^ { K } \left( \theta _ { G } , \theta _ { D } \right) } \frac { \mathrm { d } \theta _ { D } ^ { K } \left( \theta _ { G } , \theta _ { D } \right) } { \mathrm { d } \theta _ { G } } $$

Some methods have been proposed. Multiple generators and weight-sharing generators are developed to capture more modes of the distribution.

Discriminator

The mode collapse is often explained as gradient exploding of discriminator, which comes from the imbalance between the discriminator and the generator. For example, the technique of TTUR could help discriminator to keep its optimality. But some researchers believe that this is a desirable goal since a good discriminator can give good feedback and ignore the fact.

In addition, the discriminator process each example independently, the generator depends on discriminator, thus no mechanism to tell the outputs of the generator to become more similar to each other.

The idea from [2], that we could use mini-batch discrimination to help generator give better feedback

A straightforward approach to handle multimodality is to take random noise vectors along with the conditional contexts as inputs, where the contexts determine the main content and noise vectors are responsible for variations. The noise vectors are ignored or of minor impacts, since cGANs pay more attention to learn from the high-dimensional and structured conditional contexts.

Another question

Mode collapse may happen only partially? since training is stochastic progress, the input of generator network will vary and the sample drawn from the real distribution will also vary

But sometimes mode collapse is not all bad news. In style transfer using GAN, we are happy to convert one image to just a good one, rather than finding all variants. Indeed, the specialization in the partial mode collapse sometimes creates higher quality images.

** referrences**

[1]. Section 2.4 of Unrolled Generative Adversarial Networks [2]. Section 3.2 of Improved Techniques for Training GANs [3]. Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis [4]. Improving Generalization and Stability of Generative Adversarial Networks