Comprehensive Overview of Computer Vision: Topics, Techniques, and Applications

 
Course review
 
CS5670: Computer Vision
 
Noah Snavely
 
Topics – image processing
 
Filtering
Edge detection
Image resampling / aliasing / interpolation
Feature detection
Harris corners
SIFT
Invariant features
Feature matching
 
Topics – 2D geometry
 
Image transformations
Image alignment / least squares
RANSAC
Panoramas
 
Topics – 3D geometry
 
Cameras
Perspective projection
Single-view modeling (points, lines, vanishing
points, etc.)
Stereo
Two-view geometry (F-matrices, E-matrices)
Structure from motion
Multi-view stereo
 
Topics – geometry, continued
 
Light, color, perception
Lambertian reflectance
Photometric stereo
 
Topics – Recognition
 
Different kinds of recognition problems
Classification, detection, segmentation, etc.
Machine learning basics
Nearest neighbors
Linear classifiers
Hyperparameters
Training, test, validation datasets
Loss functions for classification
 
Topics – Recognition, continued
 
Regularization
Neural networks
Stochastic gradient descent
Backpropagation
Convolutional neural networks
Architectural components: convolutional layers,
pooling layers, fully connected layers
Generative methods
 
Questions?
 
 
Image Processing
Linear filtering
One simple function on images:  linear filtering
(cross-correlation, convolution)
Replace each pixel by a linear combination of its
neighbors
The prescription for the linear combination is
called the “kernel” (or “mask”, “filter”)
 
kernel
 
Modified image data
Source: L. Zhang
 
Local image data
 
Convolution
 
Same as cross-correlation, except that the
kernel is “flipped” (horizontally and vertically)
 
 
 
 
 
Convolution is 
commutative
 and 
associative
 
This is called a 
convolution
 operation:
 
Gaussian Kernel
 
Source: C. Rasmussen
 
The gradient points in the direction of most rapid increase in intensity
 
Image gradient
 
The 
gradient
 of an image:
 
The 
edge strength
 is given by the gradient magnitude:
 
 
 
The gradient direction is given by:
 
 
how does this relate to the direction of the edge?
Source: Steve Seitz
 
Finding edges
 
gradient magnitude
 
thinning
(non-maximum suppression)
 
Finding edges
 
Image sub-sampling
 
1/4  
(2x zoom)
 
1/8  
(4x zoom)
 
Why does this look so crufty?
 
1/2
 
Source: S. Seitz
 
Subsampling with Gaussian pre-filtering
 
G 1/4
 
G 1/8
 
Gaussian 1/2
 
Solution:  filter the image, 
then
 subsample
 
Source: S. Seitz
 
Image interpolation
 
“Ideal” reconstruction
 
Nearest-neighbor
interpolation
 
Linear interpolation
 
Gaussian reconstruction
 
Source: B. Curless
 
Image interpolation
 
Nearest-neighbor interpolation
 
Bilinear interpolation
 
Bicubic interpolation
 
Original image:          x 10
 
The surface 
E
(
u
,
v
) is locally approximated by a quadratic form.
 
The second moment matrix
 
The Harris operator
 
min 
is a variant of the “Harris operator” for feature detection
 
 
 
 
 
 
 
The 
trace
 is the sum of the diagonals, i.e., 
trace(H) = h
11
 + h
22
Very similar to 
min 
but less expensive (no square root)
Called the “Harris Corner Detector” or “Harris Operator”
Lots of other detectors, this is one of the most popular
 
Laplacian of Gaussian
 
“Blob” detector
 
 
 
 
 
 
Find maxima 
and minima
 of LoG operator in
space and scale
 
*
 
=
 
maximum
 
minima
Scale-space blob detector: Example
 
 
f
1
 
f
2
 
f
2
'
 
Feature distance
 
How to define the difference between two features 
f
1
, 
f
2
?
Better approach:  ratio distance = ||f
1
 - f
2
 || / || f
1
 - f
2
’ ||
f
2
 is best SSD match to f
1
 in I
2
f
2
’  is  2
nd
 best SSD match to f
1
 in I
2
gives large values for ambiguous matches
 
I
1
 
I
2
 
2D Geometry
 
Parametric (global) warping
 
Transformation T is a coordinate-changing machine:
     
p’ = 
T
(p)
What does it mean that 
T
 is global?
Is the same for any point p
can be described by just a few numbers (parameters)
Let’s consider 
linear
 xforms (can be represented by a 2D matrix):
 
p
 = (x,y)
 
p’
 = (x’,y’)
 
2D image transformations
 
These transformations are a nested set of groups
 Closed under composition and inverse is a member
 
Projective Transformations aka Homographies aka
Planar Perspective Maps
 
Called a 
homography
(or 
planar perspective map
)
 
Inverse Warping
 
Get each pixel 
g
(
x’,y’
) from its corresponding
location (
x,y
)
 
=
 
T
-1
(
x,y
) in 
f
(
x,y
)
 
f
(
x,y
)
 
g
(
x’,y’
)
 
x
 
x’
 
T
-1
(
x,y
)
 
Requires taking the inverse of the transform
 
y
 
y’
Affine transformations
 
Solving for affine transformations
Matrix form
 
RANSAC
 
General version:
1.
Randomly choose 
s
 samples
Typically 
s
 = minimum sample size that lets you fit a
model
2.
Fit a model (e.g., line) to those samples
3.
Count the number of inliers that approximately
fit the model
4.
Repeat 
N
 times
5.
Choose the model that has the largest set of
inliers
 
Projecting images onto a common
plane
 
Can’t create a 360 panorama this way…
 
3D Geometry
 
Pinhole camera
 
Add a barrier to block off most of the rays
This reduces blurring
The opening known as the 
aperture
How does this transform the image?
 
Perspective Projection
 
Projection is a matrix multiply using homogeneous coordinates:
 
This is known as 
perspective projection
The matrix is the 
projection matrix
 
Projection matrix
 
(
t 
in book’s notation)
 
translation
 
rotation
 
projection
 
intrinsics
 
Point and line duality
 
A line 
l
 is a homogeneous 3-vector
It is 
 to every point (ray) 
p
 on the line:  
l
 
p
=0
 
p
1
 
p
2
 
What is the intersection of two lines 
l
1
 and 
l
2 
?
p 
is 
 to 
l
1
 and 
l
2 
  
   
p 
= 
l
1
 
 
l
2
Points and lines are 
dual
 in projective space
 
What is the line 
l
 spanned by rays 
p
1
 and 
p
2 
?
l 
is 
 to 
p
1
 and 
p
2 
  
   
l 
= 
p
1
 
 
p
2
l
 can be interpreted as a 
plane normal
 
Vanishing points
 
Properties
Any two parallel lines (in 3D) have the same vanishing
point 
v
The ray from 
C
 through 
v
 is parallel to the lines
An image may have more than one vanishing point
in fact, every image point is a potential vanishing point
 
image plane
 
camera
center
C
 
line on ground plane
 
vanishing point 
V
 
Measuring height
 
Your basic stereo algorithm
 
compare with every pixel on same epipolar line in right image
 
pick pixel with minimum match cost
 
Stereo as energy minimization
 
Better objective function
 
{
 
{
 
match cost
 
smoothness cost
 
Want each pixel to find a good
match in the other image
 
Adjacent pixels should (usually)
move about the same amount
 
Fundamental matrix
 
This 
epipolar geometry 
of two views is described by a Very
Special 3x3 matrix      , called the 
F`undamental matrix
        
maps (homogeneous) 
points
 in image 1 to 
lines
 in image 2!
The epipolar line (in image 2) of point 
p
 is:
 
Epipolar constraint 
on corresponding points:
 
e
e
p
p
i
i
p
p
o
o
l
l
a
a
r
r
 
 
l
l
i
i
n
n
e
e
 
0
 
(projection of ray)
 
Image 1
 
Image 2
 
Epipolar geometry demo
8-point algorithm
 
In reality, instead of solving            , we seek 
f
to minimize         , least eigenvector of          .
Structure from motion
 
Camera 1
 
Camera 2
 
Camera 3
 
R
1
,t
1
 
R
2
,t
2
 
R
3
,t
3
 
p
1,1
 
p
1,2
 
p
1,3
 
non-linear least squares
Stereo:  another view
 
 
Light, reflectance, cameras
Radiometry
What determines the brightness of an image pixel?
 
Light source
properties
 
Surface
shape
 
Surface reflectance
properties
 
Optics
 
Sensor characteristics
Slide by L. Fei-Fei
 
Exposure
 
Classic reflection behavior
 
ideal specular
 
from Steve Marschner
 
Photometric stereo
 
N
 
V
 
Can write this as a matrix equation:
 
Example
 
 
Recognition
 
Image Classification
 
Slides from Andrej Karpathy and Fei-Fei Li
http://vision.stanford.edu/teaching/cs231n/
 
Object detection
 
k-nearest neighbor
 
Find the k closest points from training data
Take 
majority vote
 from K closest points
Hyperparameters
 
What is the 
best distance 
to use?
What is the 
best value of k 
to use?
 
These are 
hyperparameters
: choices about
the algorithm that we set rather than learn
 
How do we set them?
One option: try them all and see what works best
 
Parametric approach: Linear classifier
 
Loss function, cost/objective function
 
Given ground truth labels (
y
i
), scores 
f
(
x
i
, 
W
)
how unhappy are we with the scores?
 
Loss function or objective/cost function
measures unhappiness
 
During training, 
want to find the parameters W
that minimizes the loss function
 
Support vector machines
 
Find hyperplane that maximizes the 
margin
between the positive and negative examples
 
Margin
 
Support vectors
 
Distance between point
and hyperplane:
 
For support vectors,
 
Therefore, the margin is  
2 / ||
w
||
 
Multi-class SVM loss
 
Softmax classifier
 
Interpretation: squashes values into range 0 to 1
 
Optimizing weights to minimize loss
 
Stochastic gradient descent
 
Neural networks
 
Convolutional neural networks
 
Best practices for training networks
 
 
Transfer learning
 
 
Questions?
 
Good luck!
Slide Note
Embed
Share

This content provides an extensive review of various topics in computer vision, ranging from image processing and 2D/3D geometry to recognition problems and machine learning basics. It covers key concepts such as filtering, edge detection, feature matching, geometric transformations, camera perspective, stereo vision, and more. Additionally, it explores topics like light perception, color, recognition techniques, and neural networks. The provided images illustrate different aspects of computer vision, serving as visual aids for better understanding.

  • Computer Vision
  • Image Processing
  • Machine Learning
  • Recognition
  • 2D Geometry

Uploaded on Sep 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CS5670: Computer Vision Noah Snavely Course review

  2. Topics image processing Filtering Edge detection Image resampling / aliasing / interpolation Feature detection Harris corners SIFT Invariant features Feature matching

  3. Topics 2D geometry Image transformations Image alignment / least squares RANSAC Panoramas

  4. Topics 3D geometry Cameras Perspective projection Single-view modeling (points, lines, vanishing points, etc.) Stereo Two-view geometry (F-matrices, E-matrices) Structure from motion Multi-view stereo

  5. Topics geometry, continued Light, color, perception Lambertian reflectance Photometric stereo

  6. Topics Recognition Different kinds of recognition problems Classification, detection, segmentation, etc. Machine learning basics Nearest neighbors Linear classifiers Hyperparameters Training, test, validation datasets Loss functions for classification

  7. Topics Recognition, continued Regularization Neural networks Stochastic gradient descent Backpropagation Convolutional neural networks Architectural components: convolutional layers, pooling layers, fully connected layers Generative methods

  8. Questions?

  9. Image Processing

  10. Linear filtering One simple function on images: linear filtering (cross-correlation, convolution) Replace each pixel by a linear combination of its neighbors The prescription for the linear combination is called the kernel (or mask , filter ) 10 5 3 0 0 0 4 6 1 0 0.5 0 8 1 1 8 0 1 0.5 Local image data kernel Modified image data Source: L. Zhang

  11. Convolution Same as cross-correlation, except that the kernel is flipped (horizontally and vertically) This is called a convolution operation: Convolution is commutative and associative

  12. Gaussian Kernel Source: C. Rasmussen

  13. Image gradient The gradient of an image: The gradient points in the direction of most rapid increase in intensity The edge strength is given by the gradient magnitude: The gradient direction is given by: how does this relate to the direction of the edge? Source: Steve Seitz

  14. Finding edges gradient magnitude

  15. Finding edges thinning (non-maximum suppression)

  16. Image sub-sampling 1/2 1/4 (2x zoom) 1/8 (4x zoom) Why does this look so crufty? Source: S. Seitz

  17. Subsampling with Gaussian pre-filtering Gaussian 1/2 G 1/4 G 1/8 Solution: filter the image, then subsample Source: S. Seitz

  18. Image interpolation Ideal reconstruction Nearest-neighbor interpolation Linear interpolation Gaussian reconstruction Source: B. Curless

  19. Image interpolation Original image: x 10 Nearest-neighbor interpolation Bilinear interpolation Bicubic interpolation

  20. The second moment matrix The surface E(u,v) is locally approximated by a quadratic form.

  21. The Harris operator min is a variant of the Harris operator for feature detection The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22 Very similar to min but less expensive (no square root) Called the Harris Corner Detector or Harris Operator Lots of other detectors, this is one of the most popular

  22. Laplacian of Gaussian Blob detector minima * = maximum Find maxima and minima of LoG operator in space and scale

  23. Scale-space blob detector: Example

  24. Feature distance How to define the difference between two features f1, f2? Better approach: ratio distance = ||f1 - f2 || / || f1 - f2 || f2 is best SSD match to f1 in I2 f2 is 2nd best SSD match to f1 in I2 gives large values for ambiguous matches f2' f1 f2 I1 I2

  25. 2D Geometry

  26. Parametric (global) warping T p = (x,y) p = (x ,y ) Transformation T is a coordinate-changing machine: What does it mean that T is global? Is the same for any point p can be described by just a few numbers (parameters) Let s consider linear xforms (can be represented by a 2D matrix): p = T(p)

  27. 2D image transformations These transformations are a nested set of groups Closed under composition and inverse is a member

  28. Projective Transformations aka Homographies aka Planar Perspective Maps Called a homography (or planar perspective map)

  29. Inverse Warping Get each pixel g(x ,y ) from its corresponding location (x,y)=T-1(x,y) in f(x,y) Requires taking the inverse of the transform T-1(x,y) y y x x f(x,y) g(x ,y )

  30. Affine transformations

  31. Solving for affine transformations Matrix form 6x 1 2n x 1 2n x 6

  32. RANSAC General version: 1. Randomly choose s samples Typically s = minimum sample size that lets you fit a model 2. Fit a model (e.g., line) to those samples 3. Count the number of inliers that approximately fit the model 4. Repeat N times 5. Choose the model that has the largest set of inliers

  33. Projecting images onto a common plane each image is warped with a homography Can t create a 360 panorama this way mosaic PP

  34. 3D Geometry

  35. Pinhole camera Add a barrier to block off most of the rays This reduces blurring The opening known as the aperture How does this transform the image?

  36. Perspective Projection Projection is a matrix multiply using homogeneous coordinates: divide by third coordinate This is known as perspective projection The matrix is the projection matrix

  37. Projection matrix intrinsics projection rotation translation (t in book s notation)

  38. Point and line duality A line l is a homogeneous 3-vector It is to every point (ray) p on the line: lp=0 p2 p l1 p1 l l2 What is the line l spanned by rays p1 and p2 ? l is to p1 and p2 l = p1 p2 l can be interpreted as a plane normal What is the intersection of two lines l1 and l2 ? p is to l1 and l2 p = l1 l2 Points and lines are dual in projective space

  39. Vanishing points image plane vanishing point V camera center C line on ground plane line on ground plane Properties Any two parallel lines (in 3D) have the same vanishing point v The ray from C through v is parallel to the lines An image may have more than one vanishing point in fact, every image point is a potential vanishing point

  40. Measuring height 5.4 5 Camera height 4 3.3 3 2.8 2 1

  41. Your basic stereo algorithm For each epipolar line For each pixel in the left image compare with every pixel on same epipolar line in right image pick pixel with minimum match cost Improvement: match windows

  42. Stereo as energy minimization Better objective function { match cost { smoothness cost Want each pixel to find a good match in the other image Adjacent pixels should (usually) move about the same amount

  43. Fundamental matrix epipolar line (projection of ray) epipolar line epipolar plane 0 Image 1 Image 2 This epipolar geometry of two views is described by a Very Special 3x3 matrix , called the F`undamental matrix maps (homogeneous) points in image 1 to lines in image 2! The epipolar line (in image 2) of point p is: Epipolar constraint on corresponding points:

  44. Epipolar geometry demo

  45. 8-point algorithm 11 f 12 f 13 f 1 u u v u u u v v v v u v 1 1 1 1 1 1 1 1 1 1 1 1 f 21 1 u u v u u u v v v v u v 2 2 2 2 2 2 2 2 2 2 2 2 = 0 f 22 f 23 1 u u v u u u v v v v u v n n n n n n n n n n n n f 31 f 32 f 33 In reality, instead of solving , we seek f to minimize , least eigenvector of . Af = Af 0 A A

  46. Structure from motion X4 X1 X3 minimize f(R,T,P) non-linear least squares X2 X5 X7 X6 p1,1 p1,3 p1,2 Camera 1 R1,t1 Camera 3 R3,t3 Camera 2 R2,t2

  47. Stereo: another view error depth

  48. Light, reflectance, cameras

  49. Radiometry What determines the brightness of an image pixel? Light source properties Sensor characteristics Surface shape Exposure Surface reflectance properties Optics Slide by L. Fei-Fei

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#