Deep Reinforcement Learning for Human Dressing Motion Synthesis

Slide Note

Using deep reinforcement learning, this research explores synthesizing human dressing motions by breaking down the dressing sequence into subtasks and learning control policies for each subtask. The goal is to achieve dexterous manipulation of clothing while optimizing character control policies to handle variations in cloth position and character pose, enhancing efficiency and preventing garment damage.

niis111 Follow

Uploaded on Oct 07, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Learning to dress: Synthesizing human dressing motion via deep reinforcement learning

INTRODUCTION

INTRODUCTION Two main purpose Traverse inside of the garment Prevent damage to the garment

INTRODUCTION Learning a single control policy to achieve all these distinct motor skills and execute them sequentially is impractical Break down a full dressing sequence to subtasks and learn a control policy for each subtask Grasping T-shirt Tucking a hand into T-shirt Pushing a hand through a sleeve Policy sequencing algorithm avoid each policy switching

INTRODUCTION Producing a successful policy for a single subtask requires hours of simulation and optimization Benefit: The end result is not a single animation, but a character control policy that is capable of handling variations in the initial cloth position and character pose.

RELATED WORK Dexterous Manipulation of Cloth [Bai et al. 2016]

REINFORCEMENT LEARNING BACKGROUND Markov Decision Process(MDP) is a tuple S : state space A : action space r : reward function : distribution of the initial state s0 Psas : transition probability : discount factor

REINFORCEMENT LEARNING BACKGROUND Partially Observable Markov Decision Process (POMDP) Humans do not have direct perception of the full state of the world and themselves In the case of dressing, humans have limited perception of the state of the garment outside of haptic and visual observations. O is a subspace of the state space S goal : optimize the policy , represented as a neural network, such that the expected accumulated reward is maximized. All subtasks share the same action space

SEQUENCING CONTROL POLICIES

OBSERVATION SPACE The full state space of dressing tasks is typically high-dimensional Formulate a compact observation space that is tailored for dressing tasks. O = [Op, Of, Oh, Os, Ot] With carefully picked components, they observation is a 163- dimensional vector.

OBSERVATION SPACE Op : proprioception ( ) q(s) is the vector of joint angles describing the human pose at state s. The human model in this work contains 22 degrees of freedom, all of which are actuated.

OBSERVATION SPACE Of: Garment feature location The current location of a garment feature (e.g. , a sleeve opening) c : the world position of the centroid p : the world position of the garment polygon

OBSERVATION SPACE Oh: Haptics Humans rely on haptic sensing during dressing to avoid damage to clothes and to minimize discomfort. fi : 3-dimensional n = 21 (22 nodes)

OBSERVATION SPACE Os: Signed surface provide the policy with a surface sign for each haptic sensor i, that differentiates the contact between the inner and outer surfaces of the garment. If the sum of the assigned values for sensor i is positive, we consider that the sensor is in contact with the surface from inside.

OBSERVATION SPACE Ot: Task vector The task vector depends on geodesic information when the limb is in contact with the garment but has not yet entered the garment feature.

REWARD FUNCTION A good reward function is important to the success of reinforcement learning. rp: progress reward rd: deformation penalty rg: geodesic reward rt: end effector motion in the direction of the task vector rr: attracts the character to a target position

REWARD FUNCTION rp: progress reward pii = 0, ,m each joint of the limb check ci for each bone until the first encounter of ci = 1 r = bkint P , ||bi|| = the length of the bone c = centroid of the polygon P

REWARD FUNCTION rd : deformation penalty w wmid w wscale function. function. mid (=25): midpoint of the deformation penalty range. (=25): midpoint of the deformation penalty range. scale (=0.14): scales the slope and upper/lower limits of the deformation penalty (=0.14): scales the slope and upper/lower limits of the deformation penalty This formulation for deformation penalty results in little to no penalty for small deformations in order to encourage the use of contact for dressing