[논문 리뷰 스터디] Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

심화 스터디/논문 리뷰 스터디

by 이듄 2023. 5. 30. 09:48

작성자: 15기 이승은

1. Abstract

Image-to-image translation은 GAN-based 모델로 많은 발전이 있었다. 하지만 이러한 방법은 source domain에서 identity-preserving을 하는데 한계가 있다. 이는 synthesized image가 reference domain에 over-adapt할 수 있다는 것을 의미하며, 중요한 structural characteristics를 잏고 suboptimal visual quality를 보일 수 있다.

이러한 주요한 문제를 해결하기 위해서 이 논문의 저자는 FDIT (Frequency Domain Image Translation) framework를 제안한다. 이 논문의 주요 아이디어는 이미지를 low-frequency와 high-frequency components로 decompose한 후에, high-frequency dompoments가 identity (structure)를 주로 반영한다는 사실을 이용하는 것이다. 이 모델의 학습 목표는 pixel space와 Fourier spectral space 모두에서 frequency information을 보존하는 것이다.

이 논문의 저자들은 5가지의 large-scale datasets와 다양한 tasks (image2image translation, GAN-inversion)에서 모델의 성능을 테스트하였으며, SOTA 성능을 달성한 것을 확인하였다.

2. Introduction

Existing image-to-image translation은 2가지 주요한 문제가 있다.

(1) First, there is no explicit mechanism that allows preserving the identity, and as a result, the synthesized image can over-adapt to the reference domain and lose the original identity characterisitcs.

(2) Second, the generation process may lose important fine-grained details, leading to suboptimal visual quality

=> How can we enable photo-realistic image translation while better preserving the identity?

"Frequency Domain Image Translation (FDIT)"

Our key idea is to decompose the image into low- and high-frequency components, and regulate the frequency consistency during image translation

Formally, FDIT introduces novel frequency-based training objectives, which facilitate the preservation of frequency information during training.

(1) pixel space

transform each image into its high-frequency and low-frequency components by applying Gaussian kernel (i.e. low-frequency filter)

loss term regulates the high-frequency components to be similar btw the source image and the generated image

(2) frequency space

Fast Fourier Transform (FFT) 적용

the original and translated image share a similar high-frequency spectrum

3. FDIT

3.1. Pixel Space Loss

3.2. Fourier Frequency Space Loss

3.3. Overall Loss

'심화 스터디 > 논문 리뷰 스터디' 카테고리의 다른 글

[논문 리뷰 스터디] Conditional Variational Autoencoder with Adversarial Learning forEnd-to-End Text-to-Speech (2) (1)	2023.05.30
[논문 리뷰 스터디] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (0)	2023.05.30
[논문 리뷰 스터디] Distribution-Aware Coordinate Representation for Human Pose Estimation (0)	2023.05.11
[논문 리뷰 스터디] U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (0)	2023.05.11
[논문 리뷰 스터디] Playing Atari with Deep Reinforcement Learning (0)	2023.05.11