Everyday Communication Dilemmas

Avatar interoperability

J. Yoon and A. Bottino, 2023 March 31 T08:00 & 18:40 UTC

2-Mar-25

1) Importance of Avatar

2-Mar-25

2. Types of Avatar

2-Mar-25

3. Avatar

concept in MPAI

2-Mar-25

4. What is Avatar

Representation and Animation?



Objective1: To enable a user to reproduce a virtual

environment.



Objective2: to enable a user to reproduce an avatar of a

third party and its animation as intended by the sender.



Objective3: to estimate the personal status of a human or

avatar.



Objective4: to display an avatar with a selected personal

status.



Definition: Personal Status is the ensemble of information

internal to a person, including Emotion, Cognitive State,

and Attitude.

2-Mar-25

5. Examples of avatars

implementation in MPAI

2-Mar-25

Target use case: avatar-based video-

conference (ABV)

2-Mar-25

Server

Transmitting

Client

Language Preferences

Speech and Text

 (N+1x)

Avatar Model

 (N+1x)

Avatar Descriptors

 (N+1x)

Speech Descriptors

Face Descriptors

Participant ID (N+1x)

Environment Description

Virtual Secretary

Avatar

Descrip.

Particip.

ID xN

Speech and Text

Avatar Model

Avatar Descriptors

Speech

&Text

Receiving

Client

Point of

View

Output

Audio & Text

Output

Visual

Summary

Summary

Summary

Language

Preferences

Avatar

Model

Speech

 and Text

Visual

Target use case: avatar-based video-

conference (ABV)

2-Mar-25

ABV walkthrough/1

1.

user

 selects:

a.

The virtual space (Environment) where the avatars operate.

b.

The avatar model to be animated.

c.

The position occupied by an avatar in the Virtual Environment.

d.

The user’s Point of View when watching and hearing the 3D Audio-Visual Scene in the Virtual

Environment.

2.

machine

a.

Perceives the Audio-Visual components of the Real Environment and creates:

i.

An Audio Scene Description composed of independent audio objects and their locations.

ii.

A Visual Scene Description composed of independent visual objects and their locations.

iii.

A complete Audio-Visual Scene Description

2-Mar-25

ABV walkthrough/2

b.

Identifies a human belonging to a group composed of a limited number of humans

(closed set identification) using

i.

Speech Descriptors.

ii.

Face Descriptors.

c.

Decodes a human’s Personal Status by:

i.

Extracting the Descriptors of the Manifestations of a human’s Personal Status (Text,

Speech, Face, and Gesture).

ii.

Extracting a human’s Personal Status from Text, Speech, Face, and Gesture

Descriptors.

iii.

Fusing the different Personal Status in Text, Speech, Face, and Gesture into Personal

Status.

2-Mar-25

AABV walkthrough/3

d.

Animates a speaking avatar by:

i.

Synthesising speech using Text, Speech, and Personal Status of Text (PS-Text) and

Personal Status of Speech (PS-Speech).

ii.

Animating the Face using Text, Synthesised Speech, Personal Status of Face (PS-Face).

iii.

Animating the Gesture of an avatar using Text and Personal Status of Gesture (PS-

Gesture).

e.

Converses with another party (human or avatar) by:

i.

Decoding the other party’s Personal Status (Personal Status).

ii.

Animating an avatar representing it (see 5.).

f.

Summarises and refines the speech of other parties by analysing and interpreting their

Text, Speech, and Personal Status Manifestations.

2-Mar-25

MPAI approach to standardisation

MPAI-AIF enables

independently sourced

AI Modules

having standardised

interfaces

 to be executed in an environment with

standardised APIs

2-Mar-25

Open

 market of

components

 with

standardises

functions

and

Interfaces

competing in

performance

Input Audio

Audio Scene Description

User

Agent

Speech

Recognition

Speech

Compressed

Avatar Descriptors

Face & Body Description

Input

Video

Speech

Speech

Body Descrptors

Avatar Model

Language Preference

Language Preference

Avatar Model

Text

Text

Controller

Personal

Status

Personal Status

Extraction

Recognised Text

Meaning

Avatar Description

Body Descriptors

Face Descriptors

Face Descriptors

Recognised

Text

Text

Language

Understanding

Personal

Status

Audio Scene Parsing

Audio Scene

Descriptors

Communication

Global Storage

MPAI Store

ABV Client: transmitting side

2-Mar-25

Speech

xN+1)

Spatial Attitudes (xN+1)

Participant ID (xN)

Speech (xN+1)

Communication

Controller

Global Storage

MPAI Store

User

Agent

Languages Preferences

xN)

Speech Descriptors (A)

xN)

Face Descriptors (A)

xN)

Participant Identities (

xN)

Avatar Models (xN+1)

Environment Model

Participant Authentication

Avatar Descriptors (xN+1)

Avatar Descriptors (xN+1)

Environment Model

Avatar Models (xN+1)

Spatial Attitudes (xN+1)

Summary

Summary

Text

xN+1)

Text (xN+1)

Text and Speech

Translation

2-Mar-25

ABV Server

2-Mar-25

Personal

Status

Language Understanding

Recognised Text

Meaning

Summarisation

Personal

Status

Dialogue Processing

Text

(Lang-Und)

Meaning

Edited

Summary

Text

(xN)

Personal

Status

Summary

 Speech

Recognition

Controller

User Agent

Lang-Und Text

VS Text

VS Personal

Status

Personal Status Extraction

Recognised

Text

Speech (xN)

Summary

Body

Descriptors

Communication

Global Storage

MPAI Store

Personal Status Display

Descriptor Decompression

Face

Decriptors

Text

(xN)

Compressed Avatar

Descriptors (

xN

VS Compressed

Avatar Descriptors

VS Text

VS Speech

Speech

(xN)

Avatar Model

Special ABV Server: Virtual Secretary

Audio Scene

Creation

Controller

User

Agent

Point of

View

Output

Audio

AV Scene

Viewer

Speech (xN+1)

Participant ID (Nx)

Audio Scene

Output

Visual

Spatial Attitudes (xN+1)

ABV Client: Receiving Side

Visual Scene

Creation

Compressed Avatar

Descriptors (xN+1)

Participant ID (xN)

Visual Scene

Spatial Attitudes (xN+1)

Communication

Global Storage

MPAI Store

Avatar Model (xN+1

Environment Model

2-Mar-25

Personal Status Extraction

Personal

Status

Fusion

Personal Status

PS-Text Descriptors

PS-Text

Description

PS-Text

PS-Text

Interpretation

PS-Speech Descriptors

PS-Speech

Description

PS-Speech

PS-Speech

Interpretation

PS-Face Descriptors

PS-Face

Description

PS-Face

PS-Gesture Descriptors

PS-Gesture

Description

Text

Speech

Face Object

Body Object

PS-Gesture

PS-Gesture

Interpretation

PS-Face

Interpretation

Face

Descriptors

Body

Descriptors

Selection

2-Mar-25

Avatar

Description

Speech

Synthesis

Face

Description

Body

Description

Descriptor

Compression

Avatar

Synthesis

Selection

Test

Test

PS-Speech

Speech

Machine

Speech

PS-

Face

Test

PS-

Gesture

Avatar

Model

Avatar

Descriptors

Compressed

Descriptors

Avatar

Avatar

Model

Avatar

Descriptors

Face

Descriptors

Body

Descriptors

Personal Status Display

2-Mar-25

Implementation…



glTF is a specification providing information of model geometry and material.



To animate an avatar we use skeletal animation.



H-Anim is a standard providing a body model.



To capture a real body movement, devices must be used, e.g., Kinect that uses its own skeleton model,

slightly different from H-Anim.



To animate 3D models we need a real-time environment, e.g., Unity is an engine providing a complete

management of virtual environments (rendering, physics, spatialised audio, avatar animation)



Unity uses its own skeleton model, slightly different from H-Anim.



The expression of the face of an avatar model are represented by blend shapes according to

FACS. For body we use the evolution in time of joint positions and angles between joints.



MPAI intends to specify an Avatar Representation and Animation format that is genetared

(e.g., captured)  and transmitted to a receiving end to reproduce the avatar as intended by

the transmitting end

2-Mar-25

2-Mar-25

Join MPAI

Share the fun

Build the future!

https://mpai.community/

We look forward to your

 participation in this

 exciting project!

Slide Note

Embed Share

Download

Various communication dilemmas faced by individuals with hearing loss, including challenges with TV volume settings. Gain insight, reflect on personal experiences, and learn strategies for improving communication in daily life.

djac Follow

Uploaded on Mar 02, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Avatar interoperability J. Yoon and A. Bottino, 2023 March 31 T08:00 & 18:40 UTC 1 2-Mar-25

1) Importance of Avatar 2 2-Mar-25

2. Types of Avatar AVATARS ARE USED TO EXPRESS MYSELF IN VIRTUAL SPACES SUCH AS THE METAVERSE. BESIDES ME, IT IS ALSO USED TO EXPRESS THE EXISTENCE OF SOMEONE OUTSIDE, SUCH AS A VIRTUAL ASSISTANT. 3 2-Mar-25

3. Avatar concept in MPAI 4 2-Mar-25

4. What is Avatar Representation and Animation? Objective1: To enable a user to reproduce a virtual environment. Objective2: to enable a user to reproduce an avatar of a third party and its animation as intended by the sender. Objective3: to estimate the personal status of a human or avatar. Objective4: to display an avatar with a selected personal status. Definition: Personal Status is the ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude. 5 2-Mar-25

5. Examples of avatars implementation in MPAI 6 2-Mar-25

Target use case: avatar-based video- conference (ABV) 7 2-Mar-25

Target use case: avatar-based video- conference (ABV) Virtual Secretary Particip. ID xN Speech &Text Avatar Descrip. Summary Summary Language Preferences Point of View Language Preferences Visual Speech Descriptors Environment Description Output Audio & Text Participant ID (N+1x) Face Descriptors Transmitting Client Receiving Client Server Avatar Model (N+1x) Avatar Model Speech and Text Output Visual Summary Avatar Descriptors (N+1x) Avatar Descriptors Avatar Model Speech and Text (N+1x) Speech and Text 8 2-Mar-25

ABV walkthrough/1 1. A user selects: a. The virtual space (Environment) where the avatars operate. b. The avatar model to be animated. c. The position occupied by an avatar in the Virtual Environment. d. The user s Point of View when watching and hearing the 3D Audio-Visual Scene in the Virtual Environment. 2. A machine a. Perceives the Audio-Visual components of the Real Environment and creates: i. An Audio Scene Description composed of independent audio objects and their locations. ii. A Visual Scene Description composed of independent visual objects and their locations. iii.A complete Audio-Visual Scene Description 9 2-Mar-25

ABV walkthrough/2 b. Identifies a human belonging to a group composed of a limited number of humans (closed set identification) using i. Speech Descriptors. ii.Face Descriptors. c. Decodes a human s Personal Status by: i. Extracting the Descriptors of the Manifestations of a human s Personal Status (Text, Speech, Face, and Gesture). ii.Extracting a human s Personal Status from Text, Speech, Face, and Gesture Descriptors. iii.Fusing the different Personal Status in Text, Speech, Face, and Gesture into Personal Status. 10 2-Mar-25

AABV walkthrough/3 d. Animates a speaking avatar by: i. Synthesising speech using Text, Speech, and Personal Status of Text (PS-Text) and Personal Status of Speech (PS-Speech). ii.Animating the Face using Text, Synthesised Speech, Personal Status of Face (PS-Face). iii.Animating the Gesture of an avatar using Text and Personal Status of Gesture (PS- Gesture). e. Converses with another party (human or avatar) by: i. Decoding the other party s Personal Status (Personal Status). ii.Animating an avatar representing it (see 5.). f. Summarises and refines the speech of other parties by analysing and interpreting their Text, Speech, and Personal Status Manifestations. 11 2-Mar-25

MPAI approach to standardisation AI AI Open market of components with standardises functions and Interfaces, competing in performance. Module (AIM) Module (AIM) Outputs Inputs User Agent AI Workflow (AIW) AI AIM Storage AI Module (AIM) Module (AIM) Controller Global Storage MPAI Store Communication Access MPAI-AIF enables independently sourced AI Modules having standardised interfaces to be executed in an environment with standardised APIs. 12 2-Mar-25

ABV Client: transmitting side Language Preference Language Preference Avatar Model Avatar Model Text Text Speech Audio Scene Description Audio Scene Parsing Input Audio Recognised Text Audio Scene Descriptors Understanding Speech Speech Recognition Recognised Text Language Meaning User Agent Personal Status Personal Status Speech Avatar Description Extraction Text Compressed Avatar Descriptors Face Descriptors Face & Body Description Personal Status Body Descrptors Input Video Face Descriptors Body Descriptors Controller Global Storage MPAI Store Communication 2-Mar-25

ABV Server Summary Summary Environment Model Environment Model Avatar Models (xN+1) Avatar Models (xN+1) Spatial Attitudes (xN+1) Spatial Attitudes (xN+1) Avatar Descriptors (xN+1) Avatar Descriptors (xN+1) User Agent Participant Identities (xN) S Speech Descriptors (A) (xN) Participant ID (xN) Participant Authentication Face Descriptors (A) (xN) Languages Preferences (xN) Speech (xN+1) Text and Speech Translation Speech (xN+1) Text (xN+1) Text (xN+1) Controller Global Storage MPAI Store Communication 14 2-Mar-25

Special ABV Server: Virtual Secretary Text (xN) Recognition Recognised Text Lang-Und Text Speech Summary Speech (xN) Language Understanding Recognised Text Meaning Text (xN) Dialogue Processing Avatar Model Personal Status Extraction User Agent Speech (xN) Personal Status Display Personal Status VS Text S VS Personal Status Descriptor Decompression Meaning Summarisation Edited Summary Body VS Speech Text Descriptors Compressed Avatar Descriptors (xN) (Lang-Und) Summary VS Text VS Compressed Avatar Descriptors Face Personal Status Decriptors Personal Status Controller 15 2-Mar-25 Global Storage MPAI Store Communication

ABV Client: Receiving Side Environment Model Avatar Model (xN+1 Point of View Spatial Attitudes (xN+1) Visual Scene Creation Visual Scene Participant ID (xN) User Agent AV Scene Viewer Compressed Avatar Descriptors (xN+1) S Output Audio Spatial Attitudes (xN+1) Participant ID (Nx) Audio Scene Audio Scene Creation Output Visual Speech (xN+1) Controller Global Storage MPAI Store Communication 16 2-Mar-25

Personal Status Extraction Selection Text PS-Text Descriptors PS-Text PS-Text Description PS-Text Interpretation Speech PS-Speech Descriptors PS-Speech PS-Speech Description PS-Speech Interpretation Personal Status Fusion Personal Status S PS-Face Descriptors Face Object PS-Face Description PS-Face PS-Face Interpretation Face Descriptors PS-Gesture Descriptors Body Object PS-Gesture Description PS-Gesture PS-Gesture Interpretation Body Descriptors 17 2-Mar-25

Personal Status Display Selection Test Speech Test Speech Synthesis Machine Speech PS-Speech Face Descriptor Compression Face Compressed Descriptors Avatar Descriptors Description PS- Face Descriptors Avatar Description Test Avatar Model Avatar Body Avatar Synthesis Body Avatar Descriptors Description Descriptors PS- Gesture Avatar Model 18 2-Mar-25

Implementation glTF is a specification providing information of model geometry and material. To animate an avatar we use skeletal animation. H-Anim is a standard providing a body model. To capture a real body movement, devices must be used, e.g., Kinect that uses its own skeleton model, slightly different from H-Anim. To animate 3D models we need a real-time environment, e.g., Unity is an engine providing a complete management of virtual environments (rendering, physics, spatialised audio, avatar animation) Unity uses its own skeleton model, slightly different from H-Anim. The expression of the face of an avatar model are represented by blend shapes according to FACS. For body we use the evolution in time of joint positions and angles between joints. MPAI intends to specify an Avatar Representation and Animation format that is genetared (e.g., captured) and transmitted to a receiving end to reproduce the avatar as intended by the transmitting end 19 2-Mar-25