Everyday Communication Dilemmas

Avatar interoperability
J. Yoon and A. Bottino, 2023 March 31 T08:00 & 18:40 UTC
2-Mar-25
1
1) Importance of Avatar
2-Mar-25
2
2. Types of Avatar
2-Mar-25
3
3. Avatar
concept in MPAI
2-Mar-25
4
4. What is Avatar
Representation and Animation?
Objective1: To enable a user to reproduce a virtual
environment.
Objective2: to enable a user to reproduce an avatar of a
third party and its animation as intended by the sender.
Objective3: to estimate the personal status of a human or
avatar.
Objective4: to display an avatar with a selected personal
status.
Definition: Personal Status is the ensemble of information
internal to a person, including Emotion, Cognitive State,
and Attitude.
2-Mar-25
5
5. Examples of avatars
implementation in MPAI
2-Mar-25
6
Target use case: avatar-based video-
conference (ABV)
2-Mar-25
7
Server
Transmitting
Client
Language Preferences
Speech and Text
 (N+1x)
Avatar Model
 (N+1x)
Avatar Descriptors
 (N+1x)
Speech Descriptors
Face Descriptors
Participant ID (N+1x)
Environment Description
Virtual Secretary
Avatar
Descrip.
Particip.
ID xN
Speech and Text
Avatar Model
Avatar Descriptors
Speech
&Text
Receiving
Client
Point of
View
Output
Audio & Text
Output
Visual
Summary
Summary
Summary
Language
Preferences
Avatar
Model
Speech
 and Text
Visual
Target use case: avatar-based video-
conference (ABV)
2-Mar-25
8
ABV walkthrough/1
1.
A 
user
 selects:
a.
The virtual space (Environment) where the avatars operate.
b.
The avatar model to be animated.
c.
The position occupied by an avatar in the Virtual Environment.
d.
The user’s Point of View when watching and hearing the 3D Audio-Visual Scene in the Virtual
Environment.
2.
A 
machine
 
a.
Perceives the Audio-Visual components of the Real Environment and creates:
i.
An Audio Scene Description composed of independent audio objects and their locations.
ii.
A Visual Scene Description composed of independent visual objects and their locations.
iii.
A complete Audio-Visual Scene Description
2-Mar-25
9
ABV walkthrough/2
b.
Identifies a human belonging to a group composed of a limited number of humans
(closed set identification) using 
i.
Speech Descriptors.
ii.
Face Descriptors.
c.
Decodes a human’s Personal Status by:
i.
Extracting the Descriptors of the Manifestations of a human’s Personal Status (Text,
Speech, Face, and Gesture).
ii.
Extracting a human’s Personal Status from Text, Speech, Face, and Gesture
Descriptors.
iii.
Fusing the different Personal Status in Text, Speech, Face, and Gesture into Personal
Status.
2-Mar-25
10
AABV walkthrough/3
d.
Animates a speaking avatar by:
i.
Synthesising speech using Text, Speech, and Personal Status of Text (PS-Text) and
Personal Status of Speech (PS-Speech).
ii.
Animating the Face using Text, Synthesised Speech, Personal Status of Face (PS-Face).
iii.
Animating the Gesture of an avatar using Text and Personal Status of Gesture (PS-
Gesture).
e.
Converses with another party (human or avatar) by:
i.
Decoding the other party’s Personal Status (Personal Status).
ii.
Animating an avatar representing it (see 5.).
f.
Summarises and refines the speech of other parties by analysing and interpreting their
Text, Speech, and Personal Status Manifestations.
2-Mar-25
11
MPAI approach to standardisation
 
MPAI-AIF enables 
independently sourced 
AI Modules 
having standardised
interfaces
 to be executed in an environment with 
standardised APIs
.
2-Mar-25
12
 
Open
 market of
components
 with
standardises
functions
 and
Interfaces
,
competing in
performance
.
Input Audio
Audio Scene Description
User
Agent
Speech
Recognition
Speech
Compressed
Avatar Descriptors
Face & Body Description
Input 
Video
Speech
Speech
Body Descrptors
Avatar Model
Language Preference
Language Preference
Avatar Model
Text
Text
Controller
Personal
Status
Personal Status
Extraction
Recognised Text
Meaning
Avatar Description
Body Descriptors
Face Descriptors
Face Descriptors
Recognised
Text
Text
Language
Understanding
Personal
Status
Audio Scene Parsing
Audio Scene
Descriptors
Communication
Global Storage
MPAI Store
ABV Client: transmitting side
2-Mar-25
S
Speech 
 (
xN+1)
Spatial Attitudes (xN+1)
Participant ID (xN)
Speech (xN+1)
Communication
Controller
Global Storage
MPAI Store
User
Agent
Languages Preferences
 (
xN)
Speech Descriptors (A)
 (
xN)
Face Descriptors (A) 
(
xN)
Participant Identities (
xN)
Avatar Models (xN+1)
Environment Model
Participant Authentication
Avatar Descriptors (xN+1)
Avatar Descriptors (xN+1)
Environment Model
Avatar Models (xN+1)
Spatial Attitudes (xN+1)
Summary
Summary
Text
 (
xN+1)
Text (xN+1)
Text and Speech
Translation
2-Mar-25
14
ABV Server
2-Mar-25
15
S
Personal
Status
Language Understanding
Recognised Text
Meaning
Summarisation
Personal
Status
Dialogue Processing
Text
(Lang-Und)
Meaning
Edited 
Summary
Text 
(xN)
Personal
Status
Summary
 Speech
Recognition
Controller
User Agent
Lang-Und Text
VS Text
VS Personal
Status
Personal Status Extraction
Recognised
Text
Speech (xN)
Summary
Body
Descriptors
Communication
Global Storage
MPAI Store
Personal Status Display
Descriptor Decompression
Face
Decriptors
Text 
(xN)
Compressed Avatar
Descriptors (
xN
)
VS Compressed
Avatar Descriptors
VS Text
VS Speech
Speech 
(xN)
Avatar Model
Special ABV Server: Virtual Secretary
S
Audio Scene
Creation
Controller
User
Agent
Point of
View
Output
Audio
AV Scene
Viewer
Speech (xN+1)
Participant ID (Nx)
Audio Scene
Output
Visual
Spatial Attitudes (xN+1)
ABV Client: Receiving Side
Visual Scene
Creation
Compressed Avatar
Descriptors (xN+1)
Participant ID (xN)
Visual Scene
Spatial Attitudes (xN+1)
Communication
Global Storage
MPAI Store
Avatar Model (xN+1
Environment Model
2-Mar-25
16
S
Personal Status Extraction
Personal
Status
Fusion
Personal Status
PS-Text Descriptors
PS-Text
Description
PS-Text
PS-Text
Interpretation
PS-Speech Descriptors
PS-Speech
Description
PS-Speech
PS-Speech
Interpretation
PS-Face Descriptors
PS-Face
Description
PS-Face
PS-Gesture Descriptors
PS-Gesture
Description
Text
Speech
Face Object
Body Object
PS-Gesture
PS-Gesture
Interpretation
PS-Face
Interpretation
Face
Descriptors
Body
Descriptors
Selection
2-Mar-25
17
Avatar
Description
Speech
Synthesis
Face
Description
Body
Description
Descriptor
Compression
Avatar
Synthesis
Selection
Test
Test
PS-Speech
Speech
Machine
Speech
PS-
Face
Test
PS-
Gesture
Avatar
Model
Avatar
Descriptors
Compressed
Descriptors
Avatar
Avatar
Model
Avatar
Descriptors
Face
Descriptors
Body
Descriptors
Personal Status Display
2-Mar-25
18
Implementation…
glTF is a specification providing information of model geometry and material.
To animate an avatar we use skeletal animation.
H-Anim is a standard providing a body model.
To capture a real body movement, devices must be used, e.g., Kinect that uses its own skeleton model,
slightly different from H-Anim.
To animate 3D models we need a real-time environment, e.g., Unity is an engine providing a complete
management of virtual environments (rendering, physics, spatialised audio, avatar animation)
Unity uses its own skeleton model, slightly different from H-Anim.
The expression of the face of an avatar model are represented by blend shapes according to
FACS. For body we use the evolution in time of joint positions and angles between joints.
MPAI intends to specify an Avatar Representation and Animation format that is genetared
(e.g., captured)  and transmitted to a receiving end to reproduce the avatar as intended by
the transmitting end
2-Mar-25
19
2-Mar-25
20
 
Join MPAI
Share the fun
Build the future!
 
https://mpai.community/
 
We look forward to your
 participation in this
 exciting project!
Slide Note
Embed
Share

Various communication dilemmas faced by individuals with hearing loss, including challenges with TV volume settings. Gain insight, reflect on personal experiences, and learn strategies for improving communication in daily life.

  • Communication dilemmas
  • Hearing loss
  • TV volume
  • Strategies
  • Reflection

Uploaded on Mar 02, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Avatar interoperability J. Yoon and A. Bottino, 2023 March 31 T08:00 & 18:40 UTC 1 2-Mar-25

  2. 1) Importance of Avatar 2 2-Mar-25

  3. 2. Types of Avatar AVATARS ARE USED TO EXPRESS MYSELF IN VIRTUAL SPACES SUCH AS THE METAVERSE. BESIDES ME, IT IS ALSO USED TO EXPRESS THE EXISTENCE OF SOMEONE OUTSIDE, SUCH AS A VIRTUAL ASSISTANT. 3 2-Mar-25

  4. 3. Avatar concept in MPAI 4 2-Mar-25

  5. 4. What is Avatar Representation and Animation? Objective1: To enable a user to reproduce a virtual environment. Objective2: to enable a user to reproduce an avatar of a third party and its animation as intended by the sender. Objective3: to estimate the personal status of a human or avatar. Objective4: to display an avatar with a selected personal status. Definition: Personal Status is the ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude. 5 2-Mar-25

  6. 5. Examples of avatars implementation in MPAI 6 2-Mar-25

  7. Target use case: avatar-based video- conference (ABV) 7 2-Mar-25

  8. Target use case: avatar-based video- conference (ABV) Virtual Secretary Particip. ID xN Speech &Text Avatar Descrip. Summary Summary Language Preferences Point of View Language Preferences Visual Speech Descriptors Environment Description Output Audio & Text Participant ID (N+1x) Face Descriptors Transmitting Client Receiving Client Server Avatar Model (N+1x) Avatar Model Speech and Text Output Visual Summary Avatar Descriptors (N+1x) Avatar Descriptors Avatar Model Speech and Text (N+1x) Speech and Text 8 2-Mar-25

  9. ABV walkthrough/1 1. A user selects: a. The virtual space (Environment) where the avatars operate. b. The avatar model to be animated. c. The position occupied by an avatar in the Virtual Environment. d. The user s Point of View when watching and hearing the 3D Audio-Visual Scene in the Virtual Environment. 2. A machine a. Perceives the Audio-Visual components of the Real Environment and creates: i. An Audio Scene Description composed of independent audio objects and their locations. ii. A Visual Scene Description composed of independent visual objects and their locations. iii.A complete Audio-Visual Scene Description 9 2-Mar-25

  10. ABV walkthrough/2 b. Identifies a human belonging to a group composed of a limited number of humans (closed set identification) using i. Speech Descriptors. ii.Face Descriptors. c. Decodes a human s Personal Status by: i. Extracting the Descriptors of the Manifestations of a human s Personal Status (Text, Speech, Face, and Gesture). ii.Extracting a human s Personal Status from Text, Speech, Face, and Gesture Descriptors. iii.Fusing the different Personal Status in Text, Speech, Face, and Gesture into Personal Status. 10 2-Mar-25

  11. AABV walkthrough/3 d. Animates a speaking avatar by: i. Synthesising speech using Text, Speech, and Personal Status of Text (PS-Text) and Personal Status of Speech (PS-Speech). ii.Animating the Face using Text, Synthesised Speech, Personal Status of Face (PS-Face). iii.Animating the Gesture of an avatar using Text and Personal Status of Gesture (PS- Gesture). e. Converses with another party (human or avatar) by: i. Decoding the other party s Personal Status (Personal Status). ii.Animating an avatar representing it (see 5.). f. Summarises and refines the speech of other parties by analysing and interpreting their Text, Speech, and Personal Status Manifestations. 11 2-Mar-25

  12. MPAI approach to standardisation AI AI Open market of components with standardises functions and Interfaces, competing in performance. Module (AIM) Module (AIM) Outputs Inputs User Agent AI Workflow (AIW) AI AIM Storage AI Module (AIM) Module (AIM) Controller Global Storage MPAI Store Communication Access MPAI-AIF enables independently sourced AI Modules having standardised interfaces to be executed in an environment with standardised APIs. 12 2-Mar-25

  13. ABV Client: transmitting side Language Preference Language Preference Avatar Model Avatar Model Text Text Speech Audio Scene Description Audio Scene Parsing Input Audio Recognised Text Audio Scene Descriptors Understanding Speech Speech Recognition Recognised Text Language Meaning User Agent Personal Status Personal Status Speech Avatar Description Extraction Text Compressed Avatar Descriptors Face Descriptors Face & Body Description Personal Status Body Descrptors Input Video Face Descriptors Body Descriptors Controller Global Storage MPAI Store Communication 2-Mar-25

  14. ABV Server Summary Summary Environment Model Environment Model Avatar Models (xN+1) Avatar Models (xN+1) Spatial Attitudes (xN+1) Spatial Attitudes (xN+1) Avatar Descriptors (xN+1) Avatar Descriptors (xN+1) User Agent Participant Identities (xN) S Speech Descriptors (A) (xN) Participant ID (xN) Participant Authentication Face Descriptors (A) (xN) Languages Preferences (xN) Speech (xN+1) Text and Speech Translation Speech (xN+1) Text (xN+1) Text (xN+1) Controller Global Storage MPAI Store Communication 14 2-Mar-25

  15. Special ABV Server: Virtual Secretary Text (xN) Recognition Recognised Text Lang-Und Text Speech Summary Speech (xN) Language Understanding Recognised Text Meaning Text (xN) Dialogue Processing Avatar Model Personal Status Extraction User Agent Speech (xN) Personal Status Display Personal Status VS Text S VS Personal Status Descriptor Decompression Meaning Summarisation Edited Summary Body VS Speech Text Descriptors Compressed Avatar Descriptors (xN) (Lang-Und) Summary VS Text VS Compressed Avatar Descriptors Face Personal Status Decriptors Personal Status Controller 15 2-Mar-25 Global Storage MPAI Store Communication

  16. ABV Client: Receiving Side Environment Model Avatar Model (xN+1 Point of View Spatial Attitudes (xN+1) Visual Scene Creation Visual Scene Participant ID (xN) User Agent AV Scene Viewer Compressed Avatar Descriptors (xN+1) S Output Audio Spatial Attitudes (xN+1) Participant ID (Nx) Audio Scene Audio Scene Creation Output Visual Speech (xN+1) Controller Global Storage MPAI Store Communication 16 2-Mar-25

  17. Personal Status Extraction Selection Text PS-Text Descriptors PS-Text PS-Text Description PS-Text Interpretation Speech PS-Speech Descriptors PS-Speech PS-Speech Description PS-Speech Interpretation Personal Status Fusion Personal Status S PS-Face Descriptors Face Object PS-Face Description PS-Face PS-Face Interpretation Face Descriptors PS-Gesture Descriptors Body Object PS-Gesture Description PS-Gesture PS-Gesture Interpretation Body Descriptors 17 2-Mar-25

  18. Personal Status Display Selection Test Speech Test Speech Synthesis Machine Speech PS-Speech Face Descriptor Compression Face Compressed Descriptors Avatar Descriptors Description PS- Face Descriptors Avatar Description Test Avatar Model Avatar Body Avatar Synthesis Body Avatar Descriptors Description Descriptors PS- Gesture Avatar Model 18 2-Mar-25

  19. Implementation glTF is a specification providing information of model geometry and material. To animate an avatar we use skeletal animation. H-Anim is a standard providing a body model. To capture a real body movement, devices must be used, e.g., Kinect that uses its own skeleton model, slightly different from H-Anim. To animate 3D models we need a real-time environment, e.g., Unity is an engine providing a complete management of virtual environments (rendering, physics, spatialised audio, avatar animation) Unity uses its own skeleton model, slightly different from H-Anim. The expression of the face of an avatar model are represented by blend shapes according to FACS. For body we use the evolution in time of joint positions and angles between joints. MPAI intends to specify an Avatar Representation and Animation format that is genetared (e.g., captured) and transmitted to a receiving end to reproduce the avatar as intended by the transmitting end 19 2-Mar-25

  20. Join MPAI Share the fun Build the future! We look forward to your participation in this exciting project! https://mpai.community/ 20 2-Mar-25

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#