Low Latency Multi-viewpoint 360 Interactive Video System with Deep Reinforcement Learning

 
Towards Low Latency Multi-viewpoint 360°
Interactive Video: A Multimodal Deep
Reinforcement Learning Approach
 
Haitian Pang, Cong Zhang, Fangxin Wang, Jiangchuan Liu, Lifeng Sun
 
1
Problem
-
contain 6DoF interaction
-
=> need to support high dimensional interactive space
-
need great deal of throughput and computation resources on viewer side
-
=> to mitigate the download latency and rendering latency
-
various viewpoint may occur at the same FoV
-
=> introduces extra switch latency and rebuffering latency
T
h
e
y
 
w
a
n
t
 
t
o
 
d
o
:
Find the solution which can achieve low latency and high quality in multi-
viewpoint (MVP) 360 iteractive video.
2
 
Contribution
 
-
p
r
o
p
o
s
e
 
a
 
l
o
w
 
l
a
t
e
n
c
y
 
M
V
P
 
3
6
0
 
i
n
t
e
r
a
c
t
i
v
e
 
v
i
d
e
o
 
s
y
s
t
e
m
 
(
i
V
i
e
w
)
-
use multimodal learning to integrate the feature related to QoE.
-
use actor-critic-based DRL module to integrates all features and learn how to
usher the tile selection
-
lower latency and higher video quality compare to other solutions.
 
3
 
Basic Design
 
V
i
e
w
i
n
g
-
R
e
l
a
t
e
d
P
r
e
d
i
c
t
i
o
n
 
M
o
d
u
l
e
T
i
l
e
 
S
e
l
e
c
t
i
o
n
O
p
t
i
m
i
z
a
t
i
o
n
1. Prediction
2. Send result of prediction
3. Tile selection
and optimization
 
4
Problem formulation
 
Get the maximum QoE
If viewer select new viewport, x = 1
otherwise, x=0
the tile has one quality version be
requested at least
amount of data can’t overflow the buffer
The quality of tile in FoV have to better
than other
There is no overlap between the tiles and
viewpoint in FoV and other regions
5
 
MCK problem (constraints(3)~(5))
 
 
6
 
[20] K. Poularakis, G. Iosifidis, A. Argyriou, I. Koutsopoulos and L. Tassiulas, "Caching and operator cooperation policies for
layered video content delivery," 
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer
Communications
, San Francisco, CA, 2016, pp. 1-9.
 
Bitrate function redefine (constraints(6)~(8))
 
7
 
Limitation
 
1.
The TSO module cannot take the visual feature of the MVP 360° interactive
video into account due to the limitation of the model-based solution.
2.
The target of the VRP module is to improve the prediction accuracy (not
latency), which is not related to the optimization objective directly.
3.
Current decoupled design has a two-step delay, that is, the optimization step
can only be started after receiving the prediction results from the prediction
module.
 
8
 
Multimodal Learning and Deep RL
 
Multimodal learning:
Enables fully exploration for the different types of data in multimodal dataset.
 
Deep Reinforcement Learning (DRL):
 
9
 
Advantage actor-critic (A3C) algorithm
 
H
a
v
e
 
t
w
o
 
D
N
N
 
s
t
r
u
c
t
u
r
e
,
 
o
n
e
 
i
s
a
c
t
o
r
,
 
t
h
e
 
o
t
h
e
r
 
i
s
 
c
r
i
t
i
c
A
c
t
o
r
 
(
p
o
l
i
c
y
-
b
a
s
e
d
)
 
o
u
t
p
u
t
s
 
t
h
e
p
r
o
b
a
b
i
l
i
t
y
 
o
f
 
e
a
c
h
 
a
c
t
i
o
n
Critic (value-based) provides
feedback about probability to
action part
 
10
 
System architecture
 
Input feature processing:
Use CNN (grey trapezoid)
  reduce dimension
Use LSTM (yellow circle)
  capture temporal information
DRL-based training algo.
Actor
Cirtic
 
11
 
DRL-based training algo.
 
12
 
DRL-based training algo.
 
Reward function:
Reward: Quality
Penalty: Latency
 
Function of bitrate
 
13
Basic Design (BD)
FoV-buffer-based (FBB)
FoV-Rate-based (FRB)
iView
Note that the first two datasets don’t
have any viewpoint information.
They add viewpoint switching
decision according to the
characteristics of third dataset.
Evaluation
The reason is the tile setting is
3X2, which reduce the benefit
from tile selection
14
[28] D. Pisinger, “A minimal algorithm for the multiple-choice knapsack problem,” European
Journal of Operational Research, vol. 83, no. 2, pp. 394–410, 1995.
Evaluation
Different tile setting show
significant impacts on
performance of each method
15
Evaluation
iView reduces 18%-48% of
latency part and increases
14%-15% of quality part.
16
 
Conclusion
 
Propose basic design of iView, optimize the tile selection problem by
prediction module and optimization module.
Use multimodal learning and Deep reinforcement learning to combine the
feature and achieve high quality and low latency.
Achieve at least 27.2%, 15.4%, 2.8% improvement
 
17
Slide Note
Embed
Share

This research focuses on addressing the challenges of achieving low latency and high quality in multi-viewpoint (MVP) 360 interactive videos. The proposed iView system utilizes multimodal learning and a Deep Reinforcement Learning (DRL) module to optimize tile selection, aiming to reduce latency and enhance video quality. The system integrates features related to Quality of Experience (QoE) and leverages predictive tile selection techniques for efficiency.

  • Low Latency
  • Multi-viewpoint
  • 360 Video
  • Deep Reinforcement Learning
  • Interactive Video

Uploaded on Oct 08, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Towards Low Latency Multi-viewpoint 360 Interactive Video: A Multimodal Deep Reinforcement Learning Approach Haitian Pang, Cong Zhang, Fangxin Wang, Jiangchuan Liu, Lifeng Sun 1

  2. Problem - contain 6DoF interaction - => need to support high dimensional interactive space need great deal of throughput and computation resources on viewer side - => to mitigate the download latency and rendering latency various viewpoint may occur at the same FoV - => introduces extra switch latency and rebuffering latency - - They want to do: Find the solution which can achieve low latency and high quality in multi- viewpoint (MVP) 360 iteractive video. 2

  3. Contribution - - - propose a low latency MVP 360 interactive video system (iView) use multimodal learning to integrate the feature related to QoE. use actor-critic-based DRL module to integrates all features and learn how to usher the tile selection lower latency and higher video quality compare to other solutions. - 3

  4. Basic Design Viewing-Related Prediction Module Tile Selection Optimization 3. Tile selection and optimization 1. Prediction 2. Send result of prediction 4

  5. Problem formulation the tile has one quality version be requested at least Get the maximum QoE amount of data can t overflow the buffer If viewer select new viewport, x = 1 otherwise, x=0 The quality of tile in FoV have to better than other There is no overlap between the tiles and viewpoint in FoV and other regions 5

  6. MCK problem (constraints(3)~(5)) [20] K. Poularakis, G. Iosifidis, A. Argyriou, I. Koutsopoulos and L. Tassiulas, "Caching and operator cooperation policies for layered video content delivery," IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, 2016, pp. 1-9. 6

  7. Bitrate function redefine (constraints(6)~(8)) 7

  8. Limitation 1. The TSO module cannot take the visual feature of the MVP 360 interactive video into account due to the limitation of the model-based solution. 2. The target of the VRP module is to improve the prediction accuracy (not latency), which is not related to the optimization objective directly. 3. Current decoupled design has a two-step delay, that is, the optimization step can only be started after receiving the prediction results from the prediction module. 8

  9. Multimodal Learning and Deep RL Multimodal learning: Enables fully exploration for the different types of data in multimodal dataset. Deep Reinforcement Learning (DRL): 9

  10. Advantage actor-critic (A3C) algorithm Have two DNN structure, one is actor, the other is critic Actor (policy-based) outputs the probability of each action Critic (value-based) provides feedback about probability to action part 10

  11. System architecture Input feature processing: Use CNN (grey trapezoid) reduce dimension Use LSTM (yellow circle) capture temporal information DRL-based training algo. Actor Cirtic 11

  12. DRL-based training algo. 12

  13. DRL-based training algo. Reward function: Reward: Quality Penalty: Latency Function of bitrate 13

  14. Evaluation Basic Design (BD) FoV-buffer-based (FBB) FoV-Rate-based (FRB) iView The reason is the tile setting is 3X2, which reduce the benefit from tile selection Note that the first two datasets don t have any viewpoint information. They add viewpoint switching decision according to the characteristics of third dataset. [28] D. Pisinger, A minimal algorithm for the multiple-choice knapsack problem, European Journal of Operational Research, vol. 83, no. 2, pp. 394 410, 1995. 14

  15. Evaluation Different tile setting show significant impacts on performance of each method 15

  16. Evaluation iView reduces 18%-48% of latency part and increases 14%-15% of quality part. 16

  17. Conclusion Propose basic design of iView, optimize the tile selection problem by prediction module and optimization module. Use multimodal learning and Deep reinforcement learning to combine the feature and achieve high quality and low latency. Achieve at least 27.2%, 15.4%, 2.8% improvement 17

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#