GAN2 Cycle-GAN
Nengyu Wang (nengyuw2@illinois.edu)
Generative Adversarial Network (GAN) (Goodfellow et al., 2016) is a prevalent method that is wildly used to generate realistic images. However, the generating process of GAN is uncontrollable so, to apply GAN in the image-to-image translation tasks, Cycle-GAN (Zhu et al., 2017) proposes to regularize the transformation with a Cycle Consistency Loss. Compared to other methods, Cycle-GAN has following advantages:
- Pix2Pix (Isola et al., 2017) network requires paired images, that is, highly-related image and image , to perform translation, which restricts its versatility since the availability of the paired images is limited. Instead, Cycle-GAN is more flexible and be able to leverage unpaired images from different domains.
- Unlike some Neural Style Transfer (Huang and Belongie, 2017) methods that take as inputs two specific images to transfer style from style image to target image, Cycle-GAN can capture high-level characteristics of domains and transfer these information between arbitray images from different domains.
- The characteristics to be transfered are loosely defined so that we can control the way of transformation by choosing domains.
Figure 1: Paired and unpaired image datasets
Paired images share most of the features while differing only in a small degree. Producing paired images usually requries hand-craft.
Formulation of Cycle-GAN
Cycle-GAN does transformation among domains so we have transformation associated domains and along with data points and where and . To transfer images from to , we have a mapping . Similary, from to , we have . And the main structure of the Cycle-GAN can be illustrated by the following figure:
Figure 2: Cycle consistency
Beside the and , we also have and which are discriminators. It’s obvious that a pair of inversed Generative Adversarial Network are integrated into a network in a cycle manner. Therefore, conisdering each half of the cycle, we have a forward GAN from to with the generator and the discriminator , which is similar for the backward GAN.
Adversarial Loss
Because the Cycle-GAN is essentially a combination of two GANs, therefore , it naturally borrows the Adversarial Loss as an objective. For the forward network we have:
where maximizing the objective in terms of encourages the discriminator to distinguish the generated images from the sampled real images while minimizing the objective in terms of drives the generator to generate images that can fool the discriminator. And since we also has the backward network, there is a similar Adversarial Loss with the same form.
Consistency Loss
By merely minimizing the Adversarial Loss, the generated images are guaranteed to be mapped to the target domain. However, any subsets of the target domain satisfy the minimization of the Adversarial Loss. To further control the mapping, a Cycle Consistency Loss is introduced:
where the first term and the second term are the loss that measuring the distance between points of , and a corresponding points that first go through either mapping or then go through another mapping. The consistency loss can be demonstrated as following:
Figure 3: Intuition behind cycle consistency
The intuition here is to require the distance between an arbitrary point and a corresponding point that goes through the forward and backward network to be minimized, which is the same for both domains. From another angle, we can consider the forward network as an autoencoder with a encoder and a decoder . Additionally, a discriminator is used to regularized the representation of the bottleneck. Therefore, combining the Adversarial Loss with the Cycle Consistency Loss, we have the final objective:
Quantatitive Results
The first table shows the performance of different models on the human perceptual tasks, which ask real human to distinguish the real images from the generated images, therefore measuring the quality of the generation. The Cycle-GAN seems to have the best performance. The second table shows the results of generation accuracy that measures pixel-wise difference between the generated images and the groundtruth. The performance of the Cycle-GAN is slightly worse than the Pix2Pix model but according to the third table, it outperforms all models including Pix2Pix in the classification tasks.
The ablation study results in above tables show that forward cycle itself enables performance boost in the FCN tests while using both forward and backward cycle allows for maximized performance improvement in the classification tests.
Qualitative Results
Figure 4: Samples of images transformation in different styles
Figure 5: Transform artistic paitinings into realistic images
Figure 6: Images modification
Figure 7: Failure examples
Some failed cases shown above. One problem of the Cycle-GAN model is that it’s incapable of performing geometrical transformation e.g., transfering a cat to a dog. Also, the target transformation cannot be well learned when related datas are missing in the training set.
Reference
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NeurIPS, 2014.
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, in IEEE International Conference on Computer Vision (ICCV), 2017.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. CVPR, 2017.
Xun Huang, Serge Belongie. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. ICCV, 2017.