Generative Adversarial Network (GAN) (Goodfellow et al., 2016) is a prevalent method that is wildly used to generate realistic images. However, the generating process of GAN is uncontrollable so, to apply GAN in the image-to-image translation tasks, Cycle-GAN (Zhu et al., 2017) proposes to regularize the transformation with a Cycle Consistency Loss. Compared to other methods, Cycle-GAN has following advantages:

  • Pix2Pix (Isola et al., 2017) network requires paired images, that is, highly-related image xx and image yy, to perform translation, which restricts its versatility since the availability of the paired images is limited. Instead, Cycle-GAN is more flexible and be able to leverage unpaired images from different domains.
  • Unlike some Neural Style Transfer (Huang and Belongie, 2017) methods that take as inputs two specific images to transfer style from style image to target image, Cycle-GAN can capture high-level characteristics of domains and transfer these information between arbitray images from different domains.
  • The characteristics to be transfered are loosely defined so that we can control the way of transformation by choosing domains.

My image
Figure 1: Paired and unpaired image datasets

Paired images share most of the features while differing only in a small degree. Producing paired images usually requries hand-craft.

Formulation of Cycle-GAN

Cycle-GAN does transformation among domains so we have transformation associated domains X\mathcal X and Y\mathcal Y along with data points {xi}i=1N\left\{x_{i}\right\}_{i=1}^{N} and {yi}i=1N\left\{y_{i}\right\}_{i=1}^{N} where xiXx_{i} \in \mathcal X and yiYy_{i} \in \mathcal Y. To transfer images from X\mathcal X to Y\mathcal Y, we have a mapping G:XYG: X \rightarrow Y. Similary, from Y\mathcal Y to X\mathcal X, we have F:YXF: Y \rightarrow X. And the main structure of the Cycle-GAN can be illustrated by the following figure:
enter image description here
Figure 2: Cycle consistency

Beside the X\mathcal X and Y\mathcal Y, we also have DXD_X and DYD_Y which are discriminators. It’s obvious that a pair of inversed Generative Adversarial Network are integrated into a network in a cycle manner. Therefore, conisdering each half of the cycle, we have a forward GAN from X\mathcal X to Y\mathcal Y with the generator GG and the discriminator DYD_Y, which is similar for the backward GAN.

Adversarial Loss

Because the Cycle-GAN is essentially a combination of two GANs, therefore , it naturally borrows the Adversarial Loss as an objective. For the forward network we have:
LGAN(G,DY,X,Y)=Eypdata (y)[logDY(y)]+Expdata (x)[log(1DY(G(x))] \mathcal{L}_{\mathrm{GAN}}\left(G, D_{Y}, X, Y\right)=\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\log D_{Y}(y)\right] \\ +\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\log \left(1-D_{Y}(G(x))\right]\right.
where maximizing the objective in terms of DYD_Y encourages the discriminator to distinguish the generated images from the sampled real images while minimizing the objective in terms of GG drives the generator to generate images that can fool the discriminator. And since we also has the backward network, there is a similar Adversarial Loss with the same form.

Consistency Loss

By merely minimizing the Adversarial Loss, the generated images are guaranteed to be mapped to the target domain. However, any subsets of the target domain satisfy the minimization of the Adversarial Loss. To further control the mapping, a Cycle Consistency Loss is introduced:
Lcyc (G,F)=Expdata (x)[F(G(x))x1]+Eypdata (y)[G(F(y))y1] \mathcal{L}_{\text {cyc }}(G, F)=\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\|F(G(x))-x\|_{1}\right] \\ +\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\|G(F(y))-y\|_{1}\right]
where the first term and the second term are the loss that measuring the distance between points of X\mathcal X, Y\mathcal Y and a corresponding points that first go through either mapping GG or FF then go through another mapping. The consistency loss can be demonstrated as following:
enter image description here
Figure 3: Intuition behind cycle consistency

The intuition here is to require the distance between an arbitrary point and a corresponding point that goes through the forward and backward network to be minimized, which is the same for both domains. From another angle, we can consider the forward network as an autoencoder with a encoder GG and a decoder FF. Additionally, a discriminator DYD_Y is used to regularized the representation of the bottleneck. Therefore, combining the Adversarial Loss with the Cycle Consistency Loss, we have the final objective:
L(G,F,DX,DY)=LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)+λLcyc(G,F) \mathcal{L}\left(G, F, D_{X}, D_{Y}\right)=\mathcal{L}_{\mathrm{GAN}}\left(G, D_{Y}, X, Y\right) \\ +\mathcal{L}_{\mathrm{GAN}}\left(F, D_{X}, Y, X\right) \\ +\lambda \mathcal{L}_{\mathrm{cyc}}(G, F)

Quantatitive Results

enter image description here
The first table shows the performance of different models on the human perceptual tasks, which ask real human to distinguish the real images from the generated images, therefore measuring the quality of the generation. The Cycle-GAN seems to have the best performance. The second table shows the results of generation accuracy that measures pixel-wise difference between the generated images and the groundtruth. The performance of the Cycle-GAN is slightly worse than the Pix2Pix model but according to the third table, it outperforms all models including Pix2Pix in the classification tasks.
enter image description here
The ablation study results in above tables show that forward cycle itself enables performance boost in the FCN tests while using both forward and backward cycle allows for maximized performance improvement in the classification tests.

Qualitative Results

enter image description here
Figure 4: Samples of images transformation in different styles

enter image description here
Figure 5: Transform artistic paitinings into realistic images

enter image description here
Figure 6: Images modification

enter image description here
Figure 7: Failure examples

Some failed cases shown above. One problem of the Cycle-GAN model is that it’s incapable of performing geometrical transformation e.g., transfering a cat to a dog. Also, the target transformation cannot be well learned when related datas are missing in the training set.

Reference

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NeurIPS, 2014.

Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, in IEEE International Conference on Computer Vision (ICCV), 2017.

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. CVPR, 2017.

Xun Huang, Serge Belongie. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. ICCV, 2017.