Recovering Realistic Texture in Image Super-Resolution

Recovering Realistic Texture in Image Super-Resolution
Slide Note
Embed
Share

This article delves into the realm of image super-resolution focusing on texture recovery through deep spatial feature transformation. It explores the limitations of current CNN-based methods, introduces adversarial loss for enhanced visual quality, and presents a novel approach to incorporate semantic categorical priors effectively. The study addresses key issues in representing and utilizing categorical priors and emphasizes the importance of spatial feature transformation for network behavior alteration.

  • Image super-resolution
  • Texture recovery
  • Deep learning
  • Semantic segmentation
  • CNN

Uploaded on Feb 25, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy

  2. Problem enlarge 4 times High-resolution image Low-resolution image

  3. Previous work Contemporary SR algorithms are mostly CNN-based methods[1]. Most of CNN-based methods use pixel-wise loss function. (MSE-based model) good at recovering edges and smooth areas not good at texture recovery Adversarial loss is introduced in SRGAN[2] and EnhanceNet[3]. (GAN-based model) encourage the network to favor solutions that look more like natural images visual quality of reconstruction is significantly improved SRCNN SRGAN Ground-truth [1] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014. [2] C. Ledig, L. Theis, F. Husz ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, et al. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017. [3] M. S. Sajjadi, B. Sch olkopf, and M. Hirsch. EnhanceNet: Single image super-resolution through automated texture synthesis. In ICCV, 2017.

  4. Motivation building x4 building prior swap priors plant x4 plant prior

  5. Semantic categorical prior water building animal sky grass plant mountain

  6. Issues 1. How to represent the semantic categorical prior? Our approach: explore semantic segmentation probability maps as the categorical prior up to pixel level. 2. How categorical prior can be incorporated into the reconstruction process effectively? Our approach: propose a novel Spatial Feature Transform that is capable of altering the network behavior conditioned on other information.

  7. Represent categorical prior Contemporary CNN segmentation network[1] fine-tuned on LR images ? categories ?????? ResNet 101 probability maps semantic categorical prior [1] Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang. Semantic image segmentation via deep parsing network. In ICCV, 2015.

  8. Examples on segmentation Segments on LR images Segments on HR images Input LR images Ground-truth sky grass building mountain plant water animal background

  9. Incorporate conditions Categorical prior = (?1,?2, ,??) = (?1,?2, ,??) ? probability maps ?1,?2, ,?? prior ? = ??(?| ) CNN for SR ? = ??(?) ? = ??(?) restored image ? ??? ? input LR image ? ???????????????

  10. Spatial Feature Transform By learning a mapping function , the prior is modeled by a pair of affine transformation parameters (?,?). : (?,?) The modulation is then carried out by an affine transformation on feature maps ?. SF? ? ?,? = ? ?+ ? : (?,?) ? = ??(?|?,?) ? = ??(?| ) SFT ? ?,? = ? ?+ ?

  11. Spatial Feature Transform Residual block SFT layer Residual block Residual block Upsampling SFT layer SFT layer SFT layer Conv + Conv Conv Conv Conv Conv Conv Shared SFT conditions features Conv Conv Conv Conv ?? ?? Segmentation probability maps Conv Conv Conv Conv Condition Network conditions

  12. loss function Generator Adversarial loss[1] encourage the network to generate images that reside on the manifold of natural images Compete min ? max ? ?~?HR?????? + ?~?LRlog(1 ????(?) ) Discriminator Perceptual loss[2] use a pre-trained 19-layer VGG network (features before conv54) optimize a super-resolution model in a feature space 2 ???? ? ????? 2 [1] Goodfellow, Ian, et al. Generative adversarial nets. In NIPS. 2014. [2] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In ECCV, 2016.

  13. Spatial condition The modulation parameters (?,?)have a close relationship with probability maps ? and contain spatial information. ?????????map ??????map ?map of ?7 Input ?map of ?6 Restored LR patch

  14. Delicate modulation ??????map ?map of ?51 ?map of ?1 LR patch ?map of ?14 ?map of ?5 ??????map Restored

  15. Results SRCNN SRGAN EnhanceNet SFT-Net (ours) GT PSNR: 22.71dB PSNR: 22.90dB PSNR: 24.83dB PSNR: 23.36dB

  16. Results GT MemNet Bicubic SRCNN VDSR LapSRN DRRN EnhanceNet SFT-Net (ours) SRGAN MSE-based method GAN-based method

  17. User study part I Ours EnhanceNet 15 85 Ours SRGAN 67 33 76.4 75 68.7 68 65.7 56.4 54.5 sky building grass animal plant water mountain

  18. User study part II 18.4 80.4 GT 18.6 79.6 Ours 61.3 MemNet 36.3 SRCNN 37 62.4 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Rank-1 Rank-2 Rank-3 Rank-4

  19. Impact of different priors building prior sky prior grass prior mountain prior water prior plant prior animal prior bicubic building sky building building sky sky grass grass mountain mountain water water plant plant animal animal grass mountain building prior water plant animal

  20. Impact of different priors building prior sky prior grass prior mountain prior water prior plant prior animal prior bicubic building sky plant prior animal prior building prior sky prior grass prior mountain prior water prior grass mountain bicubic mountain water plant animal

  21. Other conditioning methods Input Compositional mapping[1] FiLM[2] concatenation [1] S. Zhu, S. Fidler, R. Urtasun, D. Lin, and C. C. Loy. Be your own prada: Fashion synthesis with structural coherence. In ICCV, 2017. [2] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. arXiv preprint arXiv:1709.07871, 2017.

  22. Comparison with other conditioning methods Input Compositional mapping SFT-Net (ours) FiLM concatenation

  23. Robustness to out-of-category SRGAN Ours Ours SRGAN

  24. Conclusion Explore semantic segmentation maps as categorical prior for realistic texture recovery. Propose a novel Spatial Feature Transform layer to efficiently incorporate the categorical conditions into a CNN-based SR network. Extensive comparisons and a user study demonstrate the capability of SFT-Net in generating realistic and visually pleasing textures.

  25. Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning Ke Yu Chao Dong Liang Lin Chen Change Loy

  26. Image Restoration There are many individual tasks Denoising Deblurring JPEG Deblocking Super-Resolution Towards more complicated distortions Address multiple levels of degradation in one task[1, 2] Address multiple individual tasks[3]

  27. Image Restoration A New Setting Consider multiple distortions simultaneously Real-world: Image capture and storage Synthetic: Gaussian blur, Gaussian noise and JPEG compression Real-world Scenario Gaussian Noise JPEG Synthetic Setting Gaussian Blur Compression Our New Task

  28. Motivation Can we use a single CNN to address multiple distortions? Inefficient: Require a huge network to handle all the possibilities Inflexible: All kinds of distorted images are processed with the same structure Find a more efficient and flexible approach! Process different distortion in a different way

  29. Method Decision Making Progressively restore the image quality Treat image restoration as a decision making process Blurry! Try a Artifacts! Try a Good enough :) Noisy! Try a denoising tool deblurring tool deblocking tool

  30. Method Overview Our framework requires a toolbox and an agent Toolbox Toolbox Agent Agent

  31. Method Toolbox We design 12 tools, each of which addresses a simple task 3-layer CNN[4] 8-layer CNN

  32. Method Agent Use reinforcement learning to address tool selection current distorted image Reward: PSNR gain at each step State action at last step ?1 Input Image Feature Extractor Agent ?1 12 tools ?1 ?1 LSTM Action One-hot Encoder Structure: stopping

  33. Method Joint Training Challenge of Middle State Intermediate results after several steps of processing None of the tools has seen these intermediate results Joint Training forward backward forward backward toolchain 1 MSE loss toolchain 2 ... ... ... MSE loss

  34. Experimental Results Dataset: DIV2K[5] Comparison with generic models for image restoration VDSR[1] DnCNN[3]

  35. Experimental Results Quantitative results on DIV2K Competitive performance Better generality Runtime Analyses More efficient

  36. Experimental Results Qualitative results on DIV2K Moderate Mild (unseen) Severe (unseen) Input 1st step 2nd step 3rd step VDSR-s VDSR[1]

  37. Experimental Results Qualitative results on real-world images 2nd step 3rd step 1st step VDSR[1] Input

  38. Experimental Results Ablation Study Joint training Stopping action

  39. Conclusion Contributions Address image restoration in a reinforcement learning framework Propose joint learning to cope with middle processing state Dynamically formed toolchain performs competitively against human-designed networks with less computational complexity Future work Incorporate more tools (trained with GAN loss) Handle spatial-variant distortions

  40. Thanks! Q & A

  41. Reference [1] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016. [2] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV, 2017. [3] K. Zhang,W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 2017. [4] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 38(2):295 307, 2016. [5] E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPR Workshop, 2017.

Related


More Related Content