RIDE: Reversal Invariant Descriptor Enhancement

ICCV 2015

RIDE: Reversal Invariant

Descriptor Enhancement

Speaker: Lingxi Xie

Authors: Lingxi Xie, Jingdong Wang,

Weiyao Lin, Bo Zhang, Qi Tian

State Key Laboratory of Intelligent Technology and Systems

Department of Computer Science and Technology

Tsinghua University

http://www.tsinghua.edu.cn

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

Image Classification

2/24/2025

ICCV 2015 - Presentation

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

2/24/2025

ICCV 2015 - Presentation

Raw Image

Image

Descriptors

Visual

Vocabulary

Compact

Feature Codes

Image-level

Vector

Gradient-based Local Descriptors:

SIFT [Lowe, IJCV04]

HOG [Dalal, CVPR05]

LCS [Perronnin, ECCV10]

Clustering Methods:

K-Means

Hierarchical K-Means [Nister, CVPR06]

Approximate K-Means [Philbin, CVPR07]

Hard/Soft/Sparse Coding methods:

Vector Quantization

LLC Encoding [Wang, CVPR10]

Fisher Vector Encoding [Perronnin, ECCV10]

Spatial Pooling:

Sum Pooling/Max Pooling,

Spatial Pyramid Matching [Lazebnik, CVPR06]

Geometric Phrase Pooling [Xie, ACMMM12]

2/24/2025

ICCV 2015 - Presentation

Raw Image

Image

Descriptors

Visual

Vocabulary

Compact

Feature Codes

Image-level

Vector

Gradient-based Local Descriptors:

SIFT [Lowe, IJCV04]

HOG [Dalal, CVPR05]

LCS [Perronnin, ECCV10]

Clustering Methods:

K-Means

Hierarchical K-Means [Nister, CVPR06]

Approximate K-Means [Philbin, CVPR07]

Hard/Soft/Sparse Coding methods:

Vector Quantization

LLC Encoding [Wang, CVPR10]

Fisher Vector Encoding [Perronnin, ECCV10]

Spatial Pooling:

Sum Pooling/Max Pooling,

Spatial Pyramid Matching [Lazebnik, CVPR06]

Geometric Phrase Pooling [Xie, ACMMM12]

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

Image Matching: Reversal Copy

2/24/2025

ICCV 2015 - Presentation

What We Want

What SIFT Does

Image Matching: Reversal Objects

2/24/2025

ICCV 2015 - Presentation

What We Want

What SIFT Does

Image Retrieval: Settings

•

Aircraft-100 Dataset

–

100 Aircraft Models, 100 Samples in each Model

–

ALL Images are Manually Oriented to Right

•

Why Aircrafts?

–

The Orientation of an Aircraft is Easy to Judge!

2/24/2025

ICCV 2015 - Presentation

Orientation?

Image Retrieval: Sample Images

2/24/2025

ICCV 2015 - Presentation

Image Retrieval: Original Image

2/24/2025

ICCV 2015 - Presentation

QUERY

Mean AP:

 0.4143

Mean Dist.: 0.83

Mean

TP

 Dist.: 0.34

Self-Ranking:

#1

First

FP

: #18

#1

BAE-125

#2

BAE-125

#3

BAE-125

#4

BAE-125

#5

BAE-125

#6

BAE-125

Model:

BAE-125

RESULT

Image Retrieval: Reversed Image

2/24/2025

ICCV 2015 - Presentation

QUERY

Model:

BAE-125

Mean AP:

0.

Mean Dist.: 1.09

Mean

TP

 Dist.: 1.06

Self-Ranking:

 #514

First

TP

: #388

RESULT

#1:

707-320

#2:

DC-3

#3:

Cessna-560

#4:

MD-80

#5:

737-400

#6:

747-100

Image Retrieval: Comparison

2/24/2025

ICCV 2015 - Presentation

QUERY

Mean AP:

 0.4143

Mean Dist.: 0.83

Mean

TP

 Dist.: 0.34

Self-Ranking:

#1

First

FP

: #18

Model:

BAE-125

QUERY

Model:

BAE-125

Mean AP:

0.

Mean Dist.: 1.09

Mean

TP

 Dist.: 1.06

Self-Ranking:

 #514

First

TP

: #388

What is Observed?

•

After an Image is Reversed ...

–

Handcrafted Descriptors cannot be Matched

–

Feature Representations are Completely Different

•

Classifying a Dataset with Reversed Objects

–

Reversed Objects are Different Prototypes

–

Less # of Training Samples for each Prototype

–

Inferior Recognition Accuracy

2/24/2025

ICCV 2015 - Presentation

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

What Happened after Reversal?

2/24/2025

ICCV 2015 - Presentation

Original

SIFT

Reversed

SIFT

2’

3’

4’

5’

6’

7’

0’

1’

Original

Index

Reversed

Index

Reversal Invariance: Formulation

2/24/2025

ICCV 2015 - Presentation

2/24/2025

ICCV 2015 - Presentation

2/24/2025

ICCV 2015 - Presentation

2/24/2025

ICCV 2015 - Presentation

Descriptor

Bin

Summary

•

Why Reversal Invariance?

–

Reversal Generates Different Prototypes

•

How to Obtain Reversal Invariance?

–

Compute Orientation, Get the Maximum

•

Extra Computational Costs

–

Cheap: Little Time, No Memory

•

RIDE can be Applied to Other Descriptors!

2/24/2025

ICCV 2015 - Presentation

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

Datasets

•

Fine-Grained Object Recognition

–

Oxford Pet-37 (37 Cats & Dogs, 7390 Images)

–

Aircraft-100 (100 Aircraft Models, 10000 Images)

–

Stanford Dog-120 (120 Dogs, 20780 Images)

–

Caltech-UCSD Bird-200 (200 Birds, 11788 Images)

•

Scene Classification

–

LandUse-21 (21 Land Uses, 2100 Images)

–

MIT Indoor-67 (67 Indoor Scenes, 15620 Images)

–

SUN-397 (397 In/Out-door Scenes, 108754 Images)

2/24/2025

ICCV 2015 - Presentation

Settings

•

The BoVW Model

–

Images are Resized, 300 Pixels on Longer Axis

–

Various Descriptors, Step = 6, Window Size = 12

–

PCA Reduced to 64 (Color-SIFT to 128)

–

GMM with 32 Components

–

Fisher Vector Encoding

–

Spatial Pyramid with 3 Horizontal Stripes

–

Linear SVM with C = 10

•

Stronger Features can be Used

2/24/2025

ICCV 2015 - Presentation

Models

•

Four Different Models

–

Original Descriptors (

ORIG

–

RIDE on Original Descriptors (

RIDE

–

Original Descriptors with Augmentation (

AUGM

–

RIDE with Doubled Codebook Size (

RIDEx2

•

Why Using

RIDEx2

–

Comparable Computational Costs with

AUGM

–

Fair Comparison

2/24/2025

ICCV 2015 - Presentation

Pet-37 Performance

2/24/2025

CVPR 2014 - Presentation

OPP-SIFT

46.53

SIFT

LCS

Fused

RGB-SIFT

ORIG

37.92

43.25

52.06

44.90

49.01

RIDE

42.28

44.27

54.69

47.35

48.72

AUGM

42.24

45.12

54.67

46.98

51.19

RIDEx2

45.61

46.83

57.51

49.53

Aircraft-100 Performance

2/24/2025

CVPR 2014 - Presentation

OPP-SIFT

47.06

SIFT

LCS

Fused

RGB-SIFT

ORIG

53.13

41.82

57.36

57.89

53.12

RIDE

57.82

42.86

61.27

63.09

51.39

AUGM

57.16

43.13

60.59

62.48

55.79

RIDEx2

60.14

44.81

63.62

65.11

Flower-102 Performance

2/24/2025

CVPR 2014 - Presentation

OPP-SIFT

76.12

SIFT

LCS

Fused

RGB-SIFT

ORIG

53.68

73.47

76.96

71.52

79.68

RIDE

59.12

75.30

80.51

74.97

78.83

AUGM

58.01

75.88

79.49

74.18

81.69

RIDEx2

61.09

77.40

82.14

77.10

Bird-200 Performance

2/24/2025

CVPR 2014 - Presentation

OPP-SIFT

35.40

SIFT

LCS

Fused

RGB-SIFT

ORIG

25.77

36.18

38.11

31.36

42.18

RIDE

32.14

38.50

44.73

39.16

41.72

AUGM

31.60

38.97

43.98

39.79

44.30

RIDEx2

34.07

40.16

46.38

41.73

Time & Memory Costs (on Bird-200)

2/24/2025

CVPR 2014 - Presentation

(RAM)

3.71GB

Desc.

Codeb.

Encod.

Classifi.

ORIG

2.27Hrs

0.13Hrs

0.78Hrs

1.21Hrs

3.71GB

RIDE

2.29Hrs

0.13Hrs

0.78Hrs

1.21Hrs

7.52GB

AUGM

2.30Hrs

0.13Hrs

1.56Hrs

2.46Hrs

7.51GB

RIDEx2

2.29Hrs

0.27Hrs

1.28Hrs

2.42Hrs

Comparison to the State-of-the-Art

•

Compared with

–

[DB]: the Paper which Proposed Database

–

[MAX]: Max-SIFT, Another Reversal Invariant

Descriptors for Image Classification,

ICASSP 2015

•

Fine-Grained Object Recognition

–

[GMP]: Generalized Max Pooling, CVPR 2014

•

Scene Classification

–

[DIR]: Dirichlet-based Features, CVPR 2014

2/24/2025

ICCV 2015 - Presentation

Comparison: Fine-Grained

2/24/2025

CVPR 2014 - Presentation

[GMP]

56.8

ORIG

RIDE

[DB]

[MAX]

P-37

60.24

63.49

59.21

60.65

N/A

A-100

74.61

78.92

48.69

74.39

84.6

F-102

83.53

86.45

72.8

83.13

33.3

B-200

47.61

50.81

17.0

47.20

Comparison: Fine-Grained

2/24/2025

CVPR 2014 - Presentation

[DIR]

92.8

ORIG

RIDE

[DB]

[MAX]

LandU-21

93.64

94.71

81.19

92.91

63.4

Indoor-67

63.17

64.93

26.1

62.45

46.1

SUN-397

48.35

50.12

38.0

47.69

Bird-200 with Detected Parts

2/24/2025

CVPR 2014 - Presentation

Paper

Ours

+RIDE

+RIDEx2

Chai, ICCV13

56.6

57.7

60.7

61.9

Gavves, ICCV13

62.7

62.9

65.2

66.1

Summary

•

Reversal Invariance is Useful for Recognition

–

RIDE Produces Consistent Accuracy Gain

–

For Every Single Case (Dataset with Descriptor)

–

RIDE Cooperates well with Part Detectors

•

RIDE is Cheap and Efficient

–

Extra Time and Memory Costs are Ignorable

2/24/2025

ICCV 2015 - Presentation

Outline

•

Introduction

•

The Bag-of-Feature Model

•

The RIDE Algorithm

–

Motivation

–

Towards Reversal Invariance

•

Experimental Results

•

Conclusions

2/24/2025

ICCV 2015 - Presentation

Conclusions

•

Reversal Invariance is Important

–

Causes Significant Difference in Images

–

More Prototypes, Less Training Samples

–

A Popular Solution: Data Augmentation

•

Our Solution: RIDE

–

Reversal Invariant Descriptor Enhancement

–

Aligning an Image with Its Orientation

–

Accuracy Gain, Little Extra Computation

2/24/2025

ICCV 2015 - Presentation

Future Proposals

•

Other Image Applications?

–

Tell the Orientation of an Image

•

Cooperation with Deep CNN?

–

Convolutional is NOT Reversal Invariant!

–

Can we Apply Similar Techniques?

–

Reversal Invariance vs. Data Augmentation

2/24/2025

ICCV 2015 - Presentation

Thank you!

Questions please?

2/24/2025

ICCV 2015 - Presentation

Slide Note

Embed Share

Download

Enhance your understanding of the RIDE algorithm presented at ICCV 2015, focusing on reversal invariance and image classification. Dive deep into image-level vector spatial pooling, geometric phrase pooling, and compact feature coding methods. Explore gradient-based local descriptors like SIFT and HOG, along with visual vocabulary clustering techniques. Delve into insightful experimental results and conclusions to elevate your knowledge in visual recognition technology.

bachelor_a Follow

Uploaded on Feb 24, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ICCV 2015 RIDE: Reversal Invariant Descriptor Enhancement Speaker: Lingxi Xie Authors: Lingxi Xie, Jingdong Wang, Weiyao Lin, Bo Zhang, Qi Tian State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University http://www.tsinghua.edu.cn

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 2

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 3

Image Classification 2/24/2025 ICCV 2015 - Presentation 4

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 5

Image-level Vector Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization LLC Encoding [Wang, CVPR10] Fisher Vector Encoding [Perronnin, ECCV10] Visual Vocabulary Clustering Methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based Local Descriptors: SIFT [Lowe, IJCV04] HOG [Dalal, CVPR05] LCS [Perronnin, ECCV10] Raw Image 2/24/2025 ICCV 2015 - Presentation 6

Image-level Vector Spatial Pooling: Sum Pooling/Max Pooling, Spatial Pyramid Matching [Lazebnik, CVPR06] Geometric Phrase Pooling [Xie, ACMMM12] Compact Feature Codes Hard/Soft/Sparse Coding methods: Vector Quantization LLC Encoding [Wang, CVPR10] Fisher Vector Encoding [Perronnin, ECCV10] Visual Vocabulary Clustering Methods: K-Means Hierarchical K-Means [Nister, CVPR06] Approximate K-Means [Philbin, CVPR07] Image Descriptors Gradient-based Local Descriptors: SIFT [Lowe, IJCV04] HOG [Dalal, CVPR05] LCS [Perronnin, ECCV10] Raw Image 2/24/2025 ICCV 2015 - Presentation 7

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 8

Image Matching: Reversal Copy What We Want What SIFT Does 2/24/2025 ICCV 2015 - Presentation 9

Image Matching: Reversal Objects What We Want What SIFT Does 2/24/2025 ICCV 2015 - Presentation 10

Image Retrieval: Settings Aircraft-100 Dataset 100 Aircraft Models, 100 Samples in each Model ALL Images are Manually Oriented to Right Why Aircrafts? The Orientation of an Aircraft is Easy to Judge! 2/24/2025 ICCV 2015 - Presentation 11

Image Retrieval: Sample Images 2/24/2025 ICCV 2015 - Presentation 12

Image Retrieval: Original Image RESULT QUERY #1:BAE-125 #2:BAE-125 ? = 0.00 ? = 0.22 Model: BAE-125 Mean AP: 0.4143 Mean Dist.: 0.83 Mean TP Dist.: 0.34 Self-Ranking: #1 First FP: #18 ? = 0.23 #4:BAE-125 #3:BAE-125 ? = 0.23 #5:BAE-125 #6:BAE-125 ? = 0.24 ? = 0.25 2/24/2025 ICCV 2015 - Presentation 13

Image Retrieval: Reversed Image RESULT QUERY #1:707-320 #2:DC-3 ? = 0.81 ? = 0.83 Model: BAE-125 Mean AP: 0.0025 Mean Dist.: 1.09 Mean TP Dist.: 1.06 Self-Ranking: #514 First TP: #388 #3:Cessna-560 ? = 0.84 #4:MD-80 ? = 0.84 #5:737-400 #6:747-100 ? = 0.84 ? = 0.85 2/24/2025 ICCV 2015 - Presentation 14

Image Retrieval: Comparison QUERY QUERY Model: BAE-125 Model: BAE-125 Mean AP: 0.4143 Mean Dist.: 0.83 Mean TP Dist.: 0.34 Self-Ranking: #1 First FP: #18 Mean AP: 0.0025 Mean Dist.: 1.09 Mean TP Dist.: 1.06 Self-Ranking: #514 First TP: #388 2/24/2025 ICCV 2015 - Presentation 15

What is Observed? After an Image is Reversed ... Handcrafted Descriptors cannot be Matched Feature Representations are Completely Different Classifying a Dataset with Reversed Objects Reversed Objects are Different Prototypes Less # of Training Samples for each Prototype Inferior Recognition Accuracy 2/24/2025 ICCV 2015 - Presentation 16

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 17

What Happened after Reversal? Original SIFT 0 1 Reversed SIFT 0 1 3 2 1 0 2 3 2 3 3 2 1 0 4 4 5 5 7 6 5 4 6 7 6 7 7 6 5 4 8 8 11 10 9 8 9 10 11 9 10 11 11 10 9 8 12 13 12 13 15 14 14 14 15 15 13 12 15 14 13 12 3 3 2 2 1 1 1 2 3 1 2 3 Original Index Reversed Index 4 4 0 0 0 4 0 4 14 8 + 5 = 117 13 8 + 7 = 111 5 5 6 6 7 7 7 6 5 7 6 5 2/24/2025 ICCV 2015 - Presentation 18

Reversal Invariance: Formulation Descriptors Original Descriptor: ? Reversed Descriptor: ?R What is Reversal Invariance? A Function: ? ? Which Holds: ? ? = ? ?Rfor ANY ? 2/24/2025 ICCV 2015 - Presentation 19

How to Find ? ? ? A Specific Definition Define: ? ? ? ?,?R In Which, ? ?,?R must be Symmetric So, ? ? ? ?,?R= ? ?R,? = ? ?R NOTE: ?RR= d Reversing an Image Twice is NO Change! 2/24/2025 ICCV 2015 - Presentation 20

How to Find ? ?,?R? Another Specific Definition Define: ? ?,?R is either ? or ?R Maximally Preserving the Description Power of SIFT There can be Many Other Solutions An Orientation Function ? ? The Extent that ? is Oriented to Right Compare ? ? with ? ?R The One with Larger Value is Selected If Equal, Select the One with Larger Alphabetical Order 2/24/2025 ICCV 2015 - Presentation 21

How to Find ? ? ? Bin 2 Descriptor 3 1 ? 0 1 2 3 4 0 ? 4 5 6 7 5 6 7 2 8 9 10 11 ?= ?3,1 ?3,1 2 12 13 14 15 2 ?= ?3,1 ?3,1 2 15 7 15 7 ? ? ??= ??= ??,? ??,? ? ? = ??? ?=0 ?=0 ?=0 ?=0 2/24/2025 ICCV 2015 - Presentation 22

Summary Why Reversal Invariance? Reversal Generates Different Prototypes How to Obtain Reversal Invariance? Compute Orientation, Get the Maximum Extra Computational Costs Cheap: Little Time, No Memory RIDE can be Applied to Other Descriptors! 2/24/2025 ICCV 2015 - Presentation 23

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 24

Datasets Fine-Grained Object Recognition Oxford Pet-37 (37 Cats & Dogs, 7390 Images) Aircraft-100 (100 Aircraft Models, 10000 Images) Stanford Dog-120 (120 Dogs, 20780 Images) Caltech-UCSD Bird-200 (200 Birds, 11788 Images) Scene Classification LandUse-21 (21 Land Uses, 2100 Images) MIT Indoor-67 (67 Indoor Scenes, 15620 Images) SUN-397 (397 In/Out-door Scenes, 108754 Images) 2/24/2025 ICCV 2015 - Presentation 25

Settings The BoVW Model Images are Resized, 300 Pixels on Longer Axis Various Descriptors, Step = 6, Window Size = 12 PCA Reduced to 64 (Color-SIFT to 128) GMM with 32 Components Fisher Vector Encoding Spatial Pyramid with 3 Horizontal Stripes Linear SVM with C = 10 Stronger Features can be Used 2/24/2025 ICCV 2015 - Presentation 26

Models Four Different Models Original Descriptors (ORIG) RIDE on Original Descriptors (RIDE) Original Descriptors with Augmentation (AUGM) RIDE with Doubled Codebook Size (RIDEx2) Why Using RIDEx2? Comparable Computational Costs with AUGM Fair Comparison 2/24/2025 ICCV 2015 - Presentation 27

Pet-37 Performance ORIG RIDE AUGM RIDEx2 SIFT 37.92 42.28 42.24 45.61 43.25 44.27 45.12 46.83 LCS Fused 52.06 54.69 54.67 57.51 RGB-SIFT 44.90 47.35 46.98 49.53 OPP-SIFT 46.53 49.01 48.72 51.19 2/24/2025 CVPR 2014 - Presentation 28

Aircraft-100 Performance ORIG RIDE AUGM RIDEx2 SIFT 53.13 57.82 57.16 60.14 41.82 42.86 43.13 44.81 LCS Fused 57.36 61.27 60.59 63.62 RGB-SIFT 57.89 63.09 62.48 65.11 OPP-SIFT 47.06 53.12 51.39 55.79 2/24/2025 CVPR 2014 - Presentation 29

Flower-102 Performance ORIG RIDE AUGM RIDEx2 SIFT 53.68 59.12 58.01 61.09 73.47 75.30 75.88 77.40 LCS Fused 76.96 80.51 79.49 82.14 RGB-SIFT 71.52 74.97 74.18 77.10 OPP-SIFT 76.12 79.68 78.83 81.69 2/24/2025 CVPR 2014 - Presentation 30

Bird-200 Performance ORIG RIDE AUGM RIDEx2 SIFT 25.77 32.14 31.60 34.07 36.18 38.50 38.97 40.16 LCS Fused 38.11 44.73 43.98 46.38 RGB-SIFT 31.36 39.16 39.79 41.73 OPP-SIFT 35.40 42.18 41.72 44.30 2/24/2025 CVPR 2014 - Presentation 31

Time & Memory Costs (on Bird-200) ORIG RIDE AUGM RIDEx2 Desc. 2.27Hrs 2.29Hrs 2.30Hrs 2.29Hrs 0.13Hrs 0.13Hrs 0.13Hrs 0.27Hrs Codeb. Encod. 0.78Hrs 0.78Hrs 1.56Hrs 1.28Hrs Classifi. 1.21Hrs 1.21Hrs 2.46Hrs 2.42Hrs (RAM) 3.71GB 3.71GB 7.52GB 7.51GB 2/24/2025 CVPR 2014 - Presentation 32

Comparison to the State-of-the-Art Compared with [DB]: the Paper which Proposed Database [MAX]: Max-SIFT, Another Reversal Invariant Descriptors for Image Classification, ICASSP 2015 Fine-Grained Object Recognition [GMP]: Generalized Max Pooling, CVPR 2014 Scene Classification [DIR]: Dirichlet-based Features, CVPR 2014 2/24/2025 ICCV 2015 - Presentation 33

Comparison: Fine-Grained P-37 A-100 F-102 B-200 ORIG 60.24 74.61 83.53 47.61 63.49 78.92 86.45 50.81 RIDE [DB] 59.21 48.69 72.8 17.0 [MAX] 60.65 74.39 83.13 47.20 [GMP] 56.8 N/A 84.6 33.3 2/24/2025 CVPR 2014 - Presentation 34

Comparison: Fine-Grained LandU-21 Indoor-67 SUN-397 ORIG 93.64 63.17 48.35 94.71 64.93 50.12 RIDE [DB] 81.19 26.1 38.0 [MAX] 92.91 62.45 47.69 [DIR] 92.8 63.4 46.1 2/24/2025 CVPR 2014 - Presentation 35

Bird-200 with Detected Parts Chai, ICCV13 Gavves, ICCV13 Paper 56.6 62.7 57.7 62.9 Ours +RIDE 60.7 65.2 +RIDEx2 61.9 66.1 2/24/2025 CVPR 2014 - Presentation 36

Summary Reversal Invariance is Useful for Recognition RIDE Produces Consistent Accuracy Gain For Every Single Case (Dataset with Descriptor) RIDE Cooperates well with Part Detectors RIDE is Cheap and Efficient Extra Time and Memory Costs are Ignorable 2/24/2025 ICCV 2015 - Presentation 37

Outline Introduction The Bag-of-Feature Model The RIDE Algorithm Motivation Towards Reversal Invariance Experimental Results Conclusions 2/24/2025 ICCV 2015 - Presentation 38

Conclusions Reversal Invariance is Important Causes Significant Difference in Images More Prototypes, Less Training Samples A Popular Solution: Data Augmentation Our Solution: RIDE Reversal Invariant Descriptor Enhancement Aligning an Image with Its Orientation Accuracy Gain, Little Extra Computation 2/24/2025 ICCV 2015 - Presentation 39

Future Proposals Other Image Applications? Tell the Orientation of an Image Cooperation with Deep CNN? Convolutional is NOT Reversal Invariant! Can we Apply Similar Techniques? Reversal Invariance vs. Data Augmentation 2/24/2025 ICCV 2015 - Presentation 40

Thank you! Questions please? 2/24/2025 ICCV 2015 - Presentation 41

RIDE: Reversal Invariant Descriptor Enhancement

Download Presentation

Presentation Transcript

Related

More Related Content