Recovering Realistic Texture in Image Super-Resolution
This article delves into the realm of image super-resolution focusing on texture recovery through deep spatial feature transformation. It explores the limitations of current CNN-based methods, introduces adversarial loss for enhanced visual quality, and presents a novel approach to incorporate semantic categorical priors effectively. The study addresses key issues in representing and utilizing categorical priors and emphasizes the importance of spatial feature transformation for network behavior alteration.
Uploaded on Feb 25, 2025 | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy
Problem enlarge 4 times High-resolution image Low-resolution image
Previous work Contemporary SR algorithms are mostly CNN-based methods[1]. Most of CNN-based methods use pixel-wise loss function. (MSE-based model) good at recovering edges and smooth areas not good at texture recovery Adversarial loss is introduced in SRGAN[2] and EnhanceNet[3]. (GAN-based model) encourage the network to favor solutions that look more like natural images visual quality of reconstruction is significantly improved SRCNN SRGAN Ground-truth [1] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014. [2] C. Ledig, L. Theis, F. Husz ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, et al. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017. [3] M. S. Sajjadi, B. Sch olkopf, and M. Hirsch. EnhanceNet: Single image super-resolution through automated texture synthesis. In ICCV, 2017.
Motivation building x4 building prior swap priors plant x4 plant prior
Semantic categorical prior water building animal sky grass plant mountain
Issues 1. How to represent the semantic categorical prior? Our approach: explore semantic segmentation probability maps as the categorical prior up to pixel level. 2. How categorical prior can be incorporated into the reconstruction process effectively? Our approach: propose a novel Spatial Feature Transform that is capable of altering the network behavior conditioned on other information.
Represent categorical prior Contemporary CNN segmentation network[1] fine-tuned on LR images ? categories ?????? ResNet 101 probability maps semantic categorical prior [1] Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang. Semantic image segmentation via deep parsing network. In ICCV, 2015.
Examples on segmentation Segments on LR images Segments on HR images Input LR images Ground-truth sky grass building mountain plant water animal background
Incorporate conditions Categorical prior = (?1,?2, ,??) = (?1,?2, ,??) ? probability maps ?1,?2, ,?? prior ? = ??(?| ) CNN for SR ? = ??(?) ? = ??(?) restored image ? ??? ? input LR image ? ???????????????
Spatial Feature Transform By learning a mapping function , the prior is modeled by a pair of affine transformation parameters (?,?). : (?,?) The modulation is then carried out by an affine transformation on feature maps ?. SF? ? ?,? = ? ?+ ? : (?,?) ? = ??(?|?,?) ? = ??(?| ) SFT ? ?,? = ? ?+ ?
Spatial Feature Transform Residual block SFT layer Residual block Residual block Upsampling SFT layer SFT layer SFT layer Conv + Conv Conv Conv Conv Conv Conv Shared SFT conditions features Conv Conv Conv Conv ?? ?? Segmentation probability maps Conv Conv Conv Conv Condition Network conditions
loss function Generator Adversarial loss[1] encourage the network to generate images that reside on the manifold of natural images Compete min ? max ? ?~?HR?????? + ?~?LRlog(1 ????(?) ) Discriminator Perceptual loss[2] use a pre-trained 19-layer VGG network (features before conv54) optimize a super-resolution model in a feature space 2 ???? ? ????? 2 [1] Goodfellow, Ian, et al. Generative adversarial nets. In NIPS. 2014. [2] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.
Spatial condition The modulation parameters (?,?)have a close relationship with probability maps ? and contain spatial information. ?????????map ??????map ?map of ?7 Input ?map of ?6 Restored LR patch
Delicate modulation ??????map ?map of ?51 ?map of ?1 LR patch ?map of ?14 ?map of ?5 ??????map Restored
Results SRCNN SRGAN EnhanceNet SFT-Net (ours) GT PSNR: 22.71dB PSNR: 22.90dB PSNR: 24.83dB PSNR: 23.36dB
Results GT MemNet Bicubic SRCNN VDSR LapSRN DRRN EnhanceNet SFT-Net (ours) SRGAN MSE-based method GAN-based method
User study part I Ours EnhanceNet 15 85 Ours SRGAN 67 33 76.4 75 68.7 68 65.7 56.4 54.5 sky building grass animal plant water mountain
User study part II 18.4 80.4 GT 18.6 79.6 Ours 61.3 MemNet 36.3 SRCNN 37 62.4 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Rank-1 Rank-2 Rank-3 Rank-4
Impact of different priors building prior sky prior grass prior mountain prior water prior plant prior animal prior bicubic building sky building building sky sky grass grass mountain mountain water water plant plant animal animal grass mountain building prior water plant animal
Impact of different priors building prior sky prior grass prior mountain prior water prior plant prior animal prior bicubic building sky plant prior animal prior building prior sky prior grass prior mountain prior water prior grass mountain bicubic mountain water plant animal
Other conditioning methods Input Compositional mapping[1] FiLM[2] concatenation [1] S. Zhu, S. Fidler, R. Urtasun, D. Lin, and C. C. Loy. Be your own prada: Fashion synthesis with structural coherence. In ICCV, 2017. [2] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. arXiv preprint arXiv:1709.07871, 2017.
Comparison with other conditioning methods Input Compositional mapping SFT-Net (ours) FiLM concatenation
Robustness to out-of-category SRGAN Ours Ours SRGAN
Conclusion Explore semantic segmentation maps as categorical prior for realistic texture recovery. Propose a novel Spatial Feature Transform layer to efficiently incorporate the categorical conditions into a CNN-based SR network. Extensive comparisons and a user study demonstrate the capability of SFT-Net in generating realistic and visually pleasing textures.
Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning Ke Yu Chao Dong Liang Lin Chen Change Loy
Image Restoration There are many individual tasks Denoising Deblurring JPEG Deblocking Super-Resolution Towards more complicated distortions Address multiple levels of degradation in one task[1, 2] Address multiple individual tasks[3]
Image Restoration A New Setting Consider multiple distortions simultaneously Real-world: Image capture and storage Synthetic: Gaussian blur, Gaussian noise and JPEG compression Real-world Scenario Gaussian Noise JPEG Synthetic Setting Gaussian Blur Compression Our New Task
Motivation Can we use a single CNN to address multiple distortions? Inefficient: Require a huge network to handle all the possibilities Inflexible: All kinds of distorted images are processed with the same structure Find a more efficient and flexible approach! Process different distortion in a different way
Method Decision Making Progressively restore the image quality Treat image restoration as a decision making process Blurry! Try a Artifacts! Try a Good enough :) Noisy! Try a denoising tool deblurring tool deblocking tool
Method Overview Our framework requires a toolbox and an agent Toolbox Toolbox Agent Agent
Method Toolbox We design 12 tools, each of which addresses a simple task 3-layer CNN[4] 8-layer CNN
Method Agent Use reinforcement learning to address tool selection current distorted image Reward: PSNR gain at each step State action at last step ?1 Input Image Feature Extractor Agent ?1 12 tools ?1 ?1 LSTM Action One-hot Encoder Structure: stopping
Method Joint Training Challenge of Middle State Intermediate results after several steps of processing None of the tools has seen these intermediate results Joint Training forward backward forward backward toolchain 1 MSE loss toolchain 2 ... ... ... MSE loss
Experimental Results Dataset: DIV2K[5] Comparison with generic models for image restoration VDSR[1] DnCNN[3]
Experimental Results Quantitative results on DIV2K Competitive performance Better generality Runtime Analyses More efficient
Experimental Results Qualitative results on DIV2K Moderate Mild (unseen) Severe (unseen) Input 1st step 2nd step 3rd step VDSR-s VDSR[1]
Experimental Results Qualitative results on real-world images 2nd step 3rd step 1st step VDSR[1] Input
Experimental Results Ablation Study Joint training Stopping action
Conclusion Contributions Address image restoration in a reinforcement learning framework Propose joint learning to cope with middle processing state Dynamically formed toolchain performs competitively against human-designed networks with less computational complexity Future work Incorporate more tools (trained with GAN loss) Handle spatial-variant distortions
Thanks! Q & A
Reference [1] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016. [2] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV, 2017. [3] K. Zhang,W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 2017. [4] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 38(2):295 307, 2016. [5] E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPR Workshop, 2017.