SHREC19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval Overview
The SHREC19 Track focuses on Extended 2D Scene Sketch-Based 3D Scene Retrieval, aiming to retrieve relevant 3D scene models using scene sketches as input. This challenging research direction addresses the semantic gap between 2D sketches and accurate 3D scene models, with vast applications in 3D scene reconstruction, autonomous driving cars, 3D geometry video retrieval, and 3D AR/VR entertainment. The SceneSBR2019 Benchmark has substantially expanded the dataset, building on previous research to enhance performance and promote further advancements in sketch-based retrieval.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
SHREC19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval Juefei Yuan, Hameed Abdul-Rashid, Bo Li, Yijuan Lu, Tobias Schreck, Ngoc-Minh Bui, Trong-Le Do, Khac-Tuan Nguyen, Thanh-An Nguyen, Vinh-Tiep Nguyen, Minh-Triet Tran, Tianyang Wang 1
Outline Introduction Benchmark Methods Results Conclusions and Future Work 2
Introduction 2D Scene Sketch-Based 3D Scene Retrieval Focuses on retrieving relevant 3D scene models Using scene sketches as input Motivation Vast applications: 3D scene reconstruction, autonomous driving cars, 3D geometry video retrieval, and 3D AR/VR Entertainment Challenges 2D sketches lack 3D scene information Semantic gap: iconic 2D scene sketches and accurate 3D scene models 3
Introduction (Cont.) 2D Scene Sketch-Based 3D Scene Retrieval Brand new research topic in sketch-based 3D object retrieval: A query sketch contains several objects Objects may overlap with each other Relative context configurations among the objects Our previous work SHREC 18 track: 2D Scene Sketch-Based 3D Scene Retrieval track Built SceneSBR2018 [1] benchmark: 10 scene classes, each has 25 sketches and 100 3D models Good performance called for a more comprehensive dataset We build the SceneSBR2019 Benchmark To further promote this challenging research direction Most comprehensive and largest 2D scene sketch-based 3D scene retrieval benchmark [1] J. Yuan and et al. SHREC 18 track: 2D scene sketch-based 3D scene retrieval. In 3DOR, pages 1 8, 2018 4
Outline Introduction Benchmark Methods Results Conclusions and Future Work 5
SceneSBR2019 Benchmark Overview Overview We have substantially extended the SceneSBR2018 with 20 additional classes Building process Voting method among three individuals Scene labels chosen from Places88 [2] Data collected from Flickr, Google Images and 3D Warehouse [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452 1464, 2018 6
SceneSBR2019 Benchmark 2D Scene Sketch Query Dataset 750 2D scene sketches 30 classes, each with 25 sketches 3D Scene Model Target Dataset 3,000 3D scene models 30 classes, each with 100 models To evaluate learning-based 3D scene retrieval Table 1 Training and testing dataset information of our SceneSBR2019 benchmark 7
2D Scene Sketch Query Dataset Fig. 1 Example 2D scene query sketches (1 per class) 8
3D Scene Model Target Dataset 9 Fig. 2 Example 3D target scene models (1 per class)
Evaluation Seven commonly adopted performance metrics in 3D model retrieval techniques [3]: o Precision-Recall plot (PR) o Nearest Neighbor (NN) o First Tier (FT) o Second Tier (ST) o E-Measures (E) o Discounted Cumulated Gain (DCG) o Average Precision (AP) We also have developed the code to compute them: o http://orca.st.usm.edu/~bli/SceneSBR2019/data.html [3] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck and et al. A comparison of 3D shape retrieval methods based on a large- scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1 27, 2015. 10
Outline Introduction Benchmark Methods Results Conclusions and Future Work 11
Methods ResNet50-Based Sketch Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNSRAP) View and Majority Vote Based 3D Scene Retrieval Algorithm (VMV) 12
RNSRAP: Sketch Recognition with ResNet50 Encoding and Adapting Place Classification for 3D Model Using Adversarial Training Ngoc-Minh Bui1, 2, Trong-Le Do1, 2, Khac-Tuan Nguyen1, Minh-Triet Tran1, Van-Tu Ninh1, Tu-Khiem Le1, Khac-Tuan Nguyen1, Vinh Ton- That1, Vinh-Tiep Nguyen2, Minh N. Do3, Anh-Duc Duong2 1Faculty of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam 2Software Engineering Lab, Vietnam National University - Ho Chi Minh City, Vietnam 3University of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam 13
Two-Step 3D Scene Classification 14 Fig. 3 Two-step process of the 3D scene classification method
Sketch Recognition with ResNet50 Encoding (1) Use ResNet50 output to encode a sketch image into a 2048-D feature vector (2) Data augmentation: Regular transformations: flipping, rotation, translation, and cropping Saliency map based image synthesis (3) Use two types of fully connected neural networks (4) Use multiple classification networks with different initializations for the two types of neural networks (5) Fuse the results of those models based on the majority-vote scheme to determine the label of a sketch query image 15
Saliency-Based Selection of 2D Screenshots Use multiple views of a 3D object for classification Randomly capture multiple screenshots at 3 different levels of details: (1) general views, (2) views focusing on a set of entities, and (3) detailed views on a specific entity Use DHSNet [4] to generate the saliency map of each screenshot Select promising screenshots of each 3D model for place classification task A 3D model can be classified with high accuracy (>92%) with no more than 5 information-rich screenshots [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678 686. 16
Rank List Generation Assign one or two best labels for each sketch image, and retrieve all 3D models having such labels The similarity between a sketch and a 3D model: the product of the prediction score of the query sketch and that of the 3D model on the same label Insert other 3D models which are considered irrelevant in the tail of that rank list with the distance of infinity 17
VMV: View and Majority Vote Based 3D Scene Retrieval Algorithm Juefei Yuan1, Hameed Abdul-Rashid1, Bo Li1, Yijuan Lu2, Tianyang Wang3 1School of Computing Sciences and Computer Engineering, University of Southern Mississippi, USA 2Department of Computer Science, Texas State University, USA 3Department of Computer Science & Information Technology, Austin Peay State University, USA 18
VMV Architecture Fig. 4 VMV architecture 19
VMV Algorithm VMV six steps o (1) Scene view sampling (Qmacro script) o (2) Data Augmentation Random rotations, reflections, or translations o (3) Pre-training and training on AlexNet1/VGG1 and AlexNet2/VGG2 o (4) Fine-tuning on scene sketches/views o (5) Sketch/view classification o (6) Majority vote-based label matching Fig. 5 A set of 13 sample views of an apartment scene model 20
Outline Introduction Benchmark Methods Results Conclusions and Future Work 21
Precision-Recall Fig. 6 Precision-Recall diagram performance comparisons on the testing dataset of our SceneSBR2019 benchmark for two learning-based participating methods 22
Other Six Performance Metrics Table 2. Performance metrics comparison on our SceneSBR2019 benchmark for the two learning-based participating methods More details about the retrieval performance of each individual query of every participating method are available on the SceneSBR2019 track homepage [5] [5] SceneSBR2019 track Homepage: http://orca.st.usm.edu/~bli/SceneSBR2019/results.html 23
Discussions Both of the two submitted approaches utilized CNN models o CNNs contribute a lot to the achieved performance of those two learning-based approaches o Bui utilized object-level semantic information for data augmentation and refining retrieval results Very promising to utilize both deep learning and scene semantic information to support large-scale scene retrieval The overall performance achieved on the SceneIBR2019 track is better than that on the SceneSBR2019 track o SceneIBR2019 benchmark: Replaced the query dataset with query images: 1000 for each class Much larger 2D image query dataset for better training More accurate 3D shape information in the query images Much smaller semantic gap between images and models 24
Outline Introduction Benchmark Methods Results Conclusions and Future Work 25
Conclusions Conclusions o Objective: To foster this challenging and interesting research direction: Scene Sketch-Based 3D Scene Retrieval o Dataset: Build the current largest 2D scene sketch 3D scene retrieval benchmark o Participation: Though challenging, 2 groups successfully participated in the track and contributed 4 runs of 2 methods o Evaluation: Performed a comparative evaluation on the accuracy o Impact: Provided the largest and most comprehensive common evaluation platform for sketch-based 3D scene retrieval 26
Future Work Future work o Build a large 2D scene-based 3D scene retrieval benchmark in terms of number of categories and variations within each category o Build/search other more realistic 3D scene models o 2D scene sketch-based 3D scene retrieval by incorporating semantic information o Extend the feature vectors by incorporating the geolocation estimation features o 2D scene-based 3D scene retrieval related applications o Deep learning models specifically designed for 3D scene retrieval 27
References [1] J. Yuan and et al. SHREC 18 track: 2D scene sketch-based 3D scene retrieval. In 3DOR, pages 1 8, 2018 [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452 1464, 2018 [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1 27, 2015. [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678 686. [5] Extended SceneSBR track Homepage: http://orca.st.usm.edu/~bli/SceneSBR2019/results.html 28
Thank you! Q&A? 29