Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets

 
Predicting Animation
Skeletons for 3D Articulated
Models via Volumetric Nets
 
Zhan Xu Yang Zhou1 Evangelos Kalogerakis Karan Singh
U
n
i
v
e
r
s
i
t
y
 
o
f
 
M
a
s
s
a
c
h
u
s
e
t
t
s
 
A
m
h
e
r
s
t
,
 
 
U
n
i
v
e
r
s
i
t
y
 
o
f
 
T
o
r
o
n
t
o
D
a
t
e
 
o
f
 
C
o
n
f
e
r
e
n
c
e
:
 
1
6
-
1
9
 
S
e
p
t
.
 
2
0
1
9
 
Outline
 
Introduction
Related Work
Overview
Architecture
Trainning
Results
Conclution
 
Introduction
 
Skeleton-based representation
Various input models (polygon) :
s humanoids, quadrupeds, birds,
fish,
robots and so on.
The method does not require
input textual descriptions
(labels) of joints.
 
Contributions
 
A deep architecture that incorporates volumetric and
geometric shape features to predict animation skeletons
tailored for input 3D models of articulated characters.
A method to control the level-of-detail of the output skeleton
via a single, optional input parameter.
A dataset of rigged 3D computer character models mined
from the web for training and testing learning methods for
animation skeleton prediction.
 
Related Work
 
Early algorithms for skeleton extraction from 2D images were
based on gradients of intensity maps or distance maps.
Their extracted joints often do not lie near locations where rigid
parts are connected.
Geometric skeletons may produce segments for non-
articulating parts (i.e., parts that lack their own motion).
 
-Geometric skeletons
 
Related Work
 
Methods that try to recover 3D locations of joints from 2D
images  or directly from 3D point cloud and volumetric data.
These approaches aim to predict a pre-defined set of joints for
a particular class of objects.
But
 
we don’t assume any prior skeletal structure.
 
-3D Pose Estimation
 
Related Work
 
A popular method for automatically extracting an animation
skeleton for an input 3D model is Pinocchio.
The method can evaluate the fitting cost for different
templates.
But our method aims to learn a generic model of skeleton
prediction without requiring any particular input templates.
 
-Automatic Charater Rigging
 
Overview
 
Pipeline of method and deep architecture
 
SDF (signed distance function)
LVD (local vertex density)
surface LSD (local shape diameter)
 
 
Overview
 
-Simultaneous joint and bone prediction
 
In general, input characters can vary significantly in terms of
structure.
Since joint and bone predictions are not independent of each
other, our method simultaneously learns to extract both
through a shared stack of encoder-decoder modules.
 
Overview
 
-Input shape representation
 
Input 3D models are in the form of polygon mesh soups with
varying topology.
A volumetric network is well suited for this task due to its ability
to make predictions away from the 3D model surface.
We use an implicit shape representation, namely Signed
Distance Function (SDF), as input to our volumetric network
 
Overview
 
-User Control
 
The reason for allowing user control is that the choice of
animations skeleton often depends on the task.
She animation of small parts (such as fingers, ears and so on.)
would not be noticeable and would also cause additional
computational overhead.
 
Architecture
 
-Input Shape Representation
 
Our input 3D models are in the form of polygon mesh soups.
We first extract an implicit representation of the shape in the
form of the 
Signed Distance Function (SDF
) extracted through
a fast marching method to make models be processed by 3D
netowrks.
 
Architecture
 
-Hourglass module
 
The convolution layer has a 3D kernel of size 5 
×
 5 
×
 5, and the
residual block contains two convolutional layers with kernels    3
×
 3 
×
 3 and stride 1.
The output of this residual block is a new shape feature map S
(1) of size 88 
×
 88 
×
 88 
×
 8.
The decoder is made out of 3 residual blocks that are
symmetric to the encoder.
The decoder outputs a feature map with the same resolution
as the input (size 88 
×
 88 
×
 88 
×
 8).
 
Architecture
 
-Stacked hourglass network
 
The predictions of joints and bones are inter-dependent i.e.,
the location of joints should affect the location of bones and
vice versa.
To avoid multiple near-duplicate joint predictions, we apply
non-maximum suppression as a postprocessing step to obtain
the joints of the animation skeleton.
We use a Minimum Spanning Tree (MST) algorithm that
minimizes a cost function over edges between extracted joints
representing candidate skeleton bones.
 
Training
 
-Dataset
 
We first collected a dataset of 3277 rigged characters from an
online repository, called Models Resource.
The average number of joints per character in our dataset was
26.4.
In total, we generated up to 5 variations of each model in our
training split, resulting totally in 15, 526 training models.
 
Training
 
-Trainning objective
 
Then for each training model m, we generate a target map for
joints Pˆ 
v,m
 and bones Pˆ 
b,m
 based on their animation skeleton.
 
Results
 
-Comparisons
 
Results
 
-Comparisons
 
Our CD-joint2bone measure is also lower than CD-joint
indicating that our predicted skeletons tend to overlap more
with the reference ones.
MR : If a predicted joint is located closer to a reference joint
than this tolerance, it counts as a correct prediction.
 
Conclusion
 
We presented a method for learning animation skeletons for 3D
computer characters.
Our method represents a first step towards learning a generic,
cross-category model for producing animation skeletons of 3D
models.
The method is based on a volumetric networks with limited
resolution, which can result in missing joints for small parts, such
as fingers, or misplacing other joints, such as knees and elbows.
Slide Note
Embed
Share

Skeleton-based representation for 3D models, utilizing deep architecture incorporating volumetric features to predict animation skeletons tailored for articulated characters. Method controls level-of-detail output with a single optional parameter. Dataset of rigged 3D computer character models used for training and testing. Discusses related work on geometric skeletons, 3D pose estimation, and automatic character rigging.


Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Predicting Animation Skeletons for 3D Articulated Models via Volumetric Nets Zhan Xu Yang Zhou1 Evangelos Kalogerakis Karan Singh University of Massachusetts Amherst, University of Toronto Date of Conference: 16-19 Sept. 2019 Published in: 2019 International Conference on 3D Vision (3DV)

  2. Outline Introduction Related Work Overview Architecture Trainning Results Conclution

  3. Introduction Skeleton-based representation Various input models (polygon) : s humanoids, quadrupeds, birds, fish, robots and so on. The method does not require input textual descriptions (labels) of joints.

  4. Contributions A deep architecture that incorporates volumetric and geometric shape features to predict animation skeletons tailored for input 3D models of articulated characters. A method to control the level-of-detail of the output skeleton via a single, optional input parameter. A dataset of rigged 3D computer character models mined from the web for training and testing learning methods for animation skeleton prediction.

  5. Related Work -Geometric skeletons Early algorithms for skeleton extraction from 2D images were based on gradients of intensity maps or distance maps. Their extracted joints often do not lie near locations where rigid parts are connected. Geometric skeletons may produce segments for non- articulating parts (i.e., parts that lack their own motion).

  6. Related Work -3D Pose Estimation Methods that try to recover 3D locations of joints from 2D images or directly from 3D point cloud and volumetric data. These approaches aim to predict a pre-defined set of joints for a particular class of objects. But we don t assume any prior skeletal structure.

  7. Related Work -Automatic Charater Rigging A popular method for automatically extracting an animation skeleton for an input 3D model is Pinocchio. The method can evaluate the fitting cost for different templates. But our method aims to learn a generic model of skeleton prediction without requiring any particular input templates.

  8. SDF (signed distance function) LVD (local vertex density) surface LSD (local shape diameter) Overview Pipeline of method and deep architecture

  9. Overview -Simultaneous joint and bone prediction In general, input characters can vary significantly in terms of structure. Since joint and bone predictions are not independent of each other, our method simultaneously learns to extract both through a shared stack of encoder-decoder modules.

  10. Overview -Input shape representation Input 3D models are in the form of polygon mesh soups with varying topology. A volumetric network is well suited for this task due to its ability to make predictions away from the 3D model surface. We use an implicit shape representation, namely Signed Distance Function (SDF), as input to our volumetric network

  11. Overview -User Control The reason for allowing user control is that the choice of animations skeleton often depends on the task. She animation of small parts (such as fingers, ears and so on.) would not be noticeable and would also cause additional computational overhead.

  12. Architecture -Input Shape Representation Our input 3D models are in the form of polygon mesh soups. We first extract an implicit representation of the shape in the form of the Signed Distance Function (SDF) extracted through a fast marching method to make models be processed by 3D netowrks.

  13. Architecture -Hourglass module The convolution layer has a 3D kernel of size 5 5 5, and the residual block contains two convolutional layers with kernels 3 3 3 and stride 1. The output of this residual block is a new shape feature map S (1) of size 88 88 88 8. The decoder is made out of 3 residual blocks that are symmetric to the encoder. The decoder outputs a feature map with the same resolution as the input (size 88 88 88 8).

  14. Architecture -Stacked hourglass network The predictions of joints and bones are inter-dependent i.e., the location of joints should affect the location of bones and vice versa. To avoid multiple near-duplicate joint predictions, we apply non-maximum suppression as a postprocessing step to obtain the joints of the animation skeleton. We use a Minimum Spanning Tree (MST) algorithm that minimizes a cost function over edges between extracted joints representing candidate skeleton bones.

  15. Training -Dataset We first collected a dataset of 3277 rigged characters from an online repository, called Models Resource. The average number of joints per character in our dataset was 26.4. In total, we generated up to 5 variations of each model in our training split, resulting totally in 15, 526 training models.

  16. Training -Trainning objective Then for each training model m, we generate a target map for joints P v,mand bones P b,m based on their animation skeleton.

  17. Results -Comparisons

  18. Results -Comparisons Our CD-joint2bone measure is also lower than CD-joint indicating that our predicted skeletons tend to overlap more with the reference ones. MR : If a predicted joint is located closer to a reference joint than this tolerance, it counts as a correct prediction.

  19. Conclusion We presented a method for learning animation skeletons for 3D computer characters. Our method represents a first step towards learning a generic, cross-category model for producing animation skeletons of 3D models. The method is based on a volumetric networks with limited resolution, which can result in missing joints for small parts, such as fingers, or misplacing other joints, such as knees and elbows.

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#