Live Migration of Virtual Machines - Overview and Challenges

 
Live
 
Migration
 
of
  
Virtual
 
Machines
 
Authors:
 
Christopher
 
Clark,
 
Keir
 
Fraser,
 
Steven
 
Hand,
 
Jacob
 
Gorm
Hansen,
 
Eric
 
Jul,
 
Christian
 
Limpach,
 
Ian
 
Pratt,
 
Andrew
 
Warfield
University
 
of
 
Cambridge
 
Computer
 
Laboratory
University
 
of
 
Copenhagen,
 
Denmark
 
 
Presenter:
 
Juncheng
 
Gu
 
EECS 582 – W16
 
1
 
Outline
 
Motivation
Design
Implementation
Evaluation
Conclusion
Future
 
Work
 
EECS 582 – W16
 
2
 
Motivation
 
What’s
 
VM
 
live
 
migration?
 
EECS 582 – W16
 
3
 
Move
 
VM
 
instances
 
across
 
distinct
 
physical
 
hosts
 
with
 
little
 
or
 
no
downtime
 
for
 
running
 
services.
 
Services
 
are
 
unaware
 
of
 
the
 
migration.
Maintain
 
network
 
connections
 
of
 
the
 
guest
 
OS.
VM
 
is
 
treaded
 
as
 
a
 
black
 
box.
 
Motivation
 
EECS 582 – W16
 
4
 
VM
 
live
 
migration
 
can
 
be
 
a
 
extremely
 
powerful
 
tool
 
for
 
cluster
administrators.
Hardware
 
/
 
Software
 
maintenance
 
/
 
upgrades
Load
 
balancing
 
/
 
resource
 
management
Distributed
 
power
 
management
 
Motivation
 
Why
 
OS-level
 
migration,
 
instead
 
of
 
process-level?
 
EECS 582 – W16
 
5
 
Avoid 
residual dependencies
Original
 
host
 
can
 
be
 
power-off
 
/
 
sleep
 
once
 
migration
 
completed.
Can
 
transfer
 
in-memory
 
state
 
in
 
a
 
consistent
 
and
 
efficient
 
fashion
E.g.
 
No
 
reconnection
 
for
 
media
 
streaming
 
application
Allow
 
a
 
separation
 
of
 
concerns
 
between
 
the
 
users
 
and
 
operator
 
of
 
a
 
cluster
Users
 
can
 
fully
 
control
 
of
 
the
 
software
 
and
 
services
 
within
 
their
 
VM.
Operators
 
don’t
 
care
 
about
 
what’s
 
occurring
 
within
 
the
 
VM.
 
Motivation
 
Related
 
Work
 
EECS 582 – W16
 
6
 
Design-challenges
 
Minimize
 
service
 
downtime
Minimize
 
migration
 
duration
Avoid
 
disrupting
 
running
 
service
 
EECS 582 – W16
 
7
 
Design-memory
 
migration
 
EECS 582 – W16
 
8
 
Options
 
Pre-copy
a
 
bounded
 
iterative
 
push
 
phase
 
+
 
a
 
very
 
short
 
stop-and-copy
 
phase
 
Careful
 
to
 
avoid
 
service
 
degradation
 
Design-local
 
resources
 
EECS 582 – W16
 
9
 
Open network connections
Migrating VM can keep IP and MAC address.
Broadcasts ARP new routing information
Some routers might ignore to prevent spoofing
A guest OS aware of migration can avoid this problem
Local storage
Network Attached Storage
Design-local
 
resources
EECS 582 – W16
10
Virtual Machine
Virtual Machine
Source
Destination
 
Design-overview
 
EECS 582 – W16
 
11
Implementation-writable
 
working
 
sets
Significant
 
overhead:
 
transferring
 
memory
 
pages
 
that
 
are
subsequently
 
modified.
Good
 
candidates
 
for
 
push
 
phase
Pages
 
are
 
seldom
 
or
 
never
 
modified.
Writeable
 
working
 
set
 
(WWS)
Pages
 
are
 
written
 
often,
 
and
 
should
 
best
 
be
 
transferred
 
via
 
stop-and-copy
WWS
 
behavior
WWS
 
varies
 
significantly
 
between
 
the
 
different
 
sub-benchmarks
Migration
 
results
 
depend
 
on
 
the
 
workload
 
and
 
the
 
precise
 
moment
 
when
migration
 
begins
EECS 582 – W16
12
 
Implementation-managed
 
&
 
self
 
migration
 
Managed
 
migration
Performed
 
by
 
a
 
migration
 
daemon
 
running
 
in
 
the
 
management
 
VM
Self
 
migration
Within
 
the
 
migratee
 
OS,
 
and
 
a
 
small
 
stub
 
required
 
on
 
the
 
destination
 
host
 
 
EECS 582 – W16
 
13
 
Implementation-track
 
WWS
 
(managed)
 
EECS 582 – W16
 
14
 
Using
 
shadow
 
page
 
table
 
to
 
track
 
dirty
 
pages
 
in
 
each
 
push
round
1.
Xen inserts shadow pages under the guest OS, populated using
guest OS's page tables.
2.
The shadow pages are marked read-only.
3.
If OS tries to write to a page, the resulting page fault is trapped by
Xen.
4.
Xen checks the OS's original page table and forwards the
appropriate write permission.
5.
At
 
the
 
same
 
time,
 
Xen marks the page as dirty
 
in
 
bitmap
.
 
At
 
the
 
beginning
 
of
 
next
 
push
 
round
Last
 
round’s
 
bitmap
 
is
 
copied
 
to
 
the
 
control
 
software,
 
Xen’s
 
bitmap
 
is
cleared.
Shadow
 
page
 
tables
 
are
 
destroyed
 
and
 
recreated,
 
all
 
write
permissions
 
are
 
lost
 
Implementation-dynamic
 
rate
 
limiting
 
EECS 582 – W16
 
15
 
More
 
network
 
bandwidth,
 
less
 
service
 
downtime
 
!
 
Less
 
network
 
bandwidth,
 
less
 
impact
 
on
 
running
 
service
 
!
 
Implementation-paravirtualized
optimizations
 
Stunning
 
rouge
 
processes
Rouge
 
process:
 
generate
 
dirty
 
page
 
at
 
a
 
very
 
high
 
rate
 
(write
 
one
 
word
 
in
 
every
page)
Forking
 
a
 
monitor
 
process:
 
monitor
 
the
 
WWS
 
of
 
individual
 
processes
If
 
a
 
process
 
exceeds
 
write
 
fault
 
limitation,
 
then
 
move
 
it
 
to
 
wait
 
queue
 
Freeing
 
page
 
cache
 
pages
Typically,
 
OS
 
have
 
a
 
number
 
of
 
free
 
pages
Using
 
ballooning
 
mechanism
 
to
 
return
 
free
 
pages
 
to
 
VMM
 
EECS 582 – W16
 
16
 
Evaluation-simple
 
web
 
server
 
EECS 582 – W16
 
17
 
A
 
highly
 
loaded
 
server
 
with
 
relative
 
small
 
WWS
Controlled
 
impact
 
on
 
live
 
services
Short
 
downtime
Migration
 
starts
 
Evaluation-rapid
 
page
 
dirtying
 
EECS 582 – W16
 
18
 
In
 
the
 
third
 
round,
 
the
 
transfer
 
rate
 
is
 
scaled
 
up
 
to
 
500Mbit/s
 
(max)
Switch
 
to
 
stop-and-copy,
 
resulting
 
in
 
3.5s
 
downtime
Diabolical
 
workload
 
may
 
suffer
 
considerable
 
service
 
downtime
Stop-and-copy
 
Conclusion
 
OS-level
 
live
 
migration
Pre-copy:
 
iterative
 
push
 
and
 
short
 
stop-and-copy
Dynamically
 
adapting
 
network-bandwidth
-
Balance
 
service
 
downtime
 
and
 
service
 
performance
 
degradation
Paravirtualized
 
optimizations
Minimize
 
service
 
downtime
 
and
 
impact
 
on
 
running
 
service
 
EECS 582 – W16
 
19
 
Future
 
Work
 
Cluster
 
management
-
Make
 
decisions
 
for
 
the
 
placement
 
and
 
movement
 
of
 
virtual
 
machines
 
Wide
 
Area
 
Network
 
Redirection
-
OS
 
will
 
have
 
to
 
obtain
 
a
 
new
 
IP
 
address,
 
or
 
some
 
kind
 
of
 
indirection
 
layer
 
Storage
 
Migration
-
Local
 
disks
 
are
 
considerably
 
larger
 
than
 
volatile
 
memory
 
 
EECS 582 – W16
 
20
 
Q&A
 
Thank
 
You!
 
EECS 582 – W16
 
21
Slide Note
Embed
Share

The paper discusses the concept of live migration of virtual machines, its motivations, benefits, and challenges in implementation. It covers the reasons for choosing OS-level migration over process-level migration, related works in the field, design challenges, and strategies for minimizing service downtime and migration duration. The authors delve into memory migration options and emphasize the importance of maintaining service quality during the migration process.

  • Virtual Machines
  • Live Migration
  • OS-level Migration
  • Memory Migration
  • Design Challenges

Uploaded on Oct 11, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Live Migration of Virtual Machines Authors: Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory University of Copenhagen, Denmark Presenter: Juncheng Gu EECS 582 W16 1

  2. Outline Motivation Design Implementation Evaluation Conclusion Future Work EECS 582 W16 2

  3. Motivation What s VM live migration? Move VM instances across distinct physical hosts with little or no downtime for running services. Services are unaware of the migration. Maintain network connections of the guest OS. VM is treaded as a black box. EECS 582 W16 3

  4. Motivation VM live migration can be a extremely powerful tool for cluster administrators. Hardware / Software maintenance / upgrades Load balancing / resource management Distributed power management EECS 582 W16 4

  5. Motivation Why OS-level migration, instead of process-level? Avoid residual dependencies Original host can be power-off / sleep once migration completed. Can transfer in-memory state in a consistent and efficient fashion E.g. No reconnection for media streaming application Allow a separation of concerns between the users and operator of a cluster Users can fully control of the software and services within their VM. Operators don t care about what s occurring within the VM. EECS 582 W16 5

  6. Motivation Related Work Approach Feature Collective project stop-and-copy Zap stop-and-copy VMotion similar with live migration Process migration residual dependencies EECS 582 W16 6

  7. Design-challenges Minimize service downtime Minimize migration duration Avoid disrupting running service Source Host Destination Host .BI N V ML D .VS .X .VH Storage EECS 582 W16 7

  8. Design-memory migration Options Phase service downtime migration duration push - - stop-and-copy longest shortest pull (demand) shortest longest Pre-copy a bounded iterative push phase + a very short stop-and-copy phase Careful to avoid service degradation EECS 582 W16 8

  9. Design-local resources Open network connections Migrating VM can keep IP and MAC address. Broadcasts ARP new routing information Some routers might ignore to prevent spoofing A guest OS aware of migration can avoid this problem Local storage Network Attached Storage EECS 582 W16 9

  10. Design-local resources Virtual Machine Virtual Machine Source Destination EECS 582 W16 10

  11. Design-overview EECS 582 W16 11

  12. Implementation-writable working sets Significant overhead: transferring memory pages that are subsequently modified. Good candidates for push phase Pages are seldom or never modified. Writeable working set (WWS) Pages are written often, and should best be transferred via stop-and-copy WWS behavior WWS varies significantly between the different sub-benchmarks Migration results depend on the workload and the precise moment when migration begins EECS 582 W16 12

  13. Implementation-managed & self migration Managed migration Performed by a migration daemon running in the management VM Self migration Within the migratee OS, and a small stub required on the destination host Difference Managed Self Track WWS shadow page table + bitmap bitmap + a spare bit in PTE suspend OS to obtain a consistent checkpoint two-stage stop-and-copy, ignore page updates in last transfer Stop-and-copy EECS 582 W16 13

  14. Implementation-track WWS (managed) Using shadow page table to track dirty pages in each push round 1. Xen inserts shadow pages under the guest OS, populated using guest OS's page tables. 2. The shadow pages are marked read-only. 3. If OS tries to write to a page, the resulting page fault is trapped by Xen. 4. Xen checks the OS's original page table and forwards the appropriate write permission. 5. At the same time, Xen marks the page as dirty in bitmap. At the beginning of next push round Last round s bitmap is copied to the control software, Xen s bitmap is cleared. Shadow page tables are destroyed and recreated, all write permissions are lost EECS 582 W16 14

  15. Implementation-dynamic rate limiting More network bandwidth, less service downtime ! performance downtime Less network bandwidth, less impact on running service ! Dynamically adapt the bandwidth limit during each round - Set a minimum and a maximum bandwidth limit, begin with the minimum limit - ???????? ????= dirty ????current+ ???????? ????????? - ????? ???????????= ???????? ????? ????? When terminate push, and switch to stop-and-copy ? - ????? ????current> ???????? ??? - ????? ????? < ? ??? ??? EECS 582 W16 15

  16. Implementation-paravirtualized optimizations Stunning rouge processes Rouge process: generate dirty page at a very high rate (write one word in every page) Forking a monitor process: monitor the WWS of individual processes If a process exceeds write fault limitation, then move it to wait queue Freeing page cache pages Typically, OS have a number of free pages Using ballooning mechanism to return free pages to VMM EECS 582 W16 16

  17. Evaluation-simple web server Migration starts A highly loaded server with relative small WWS Controlled impact on live services Short downtime EECS 582 W16 17

  18. Evaluation-rapid page dirtying Stop-and-copy In the third round, the transfer rate is scaled up to 500Mbit/s (max) Switch to stop-and-copy, resulting in 3.5s downtime Diabolical workload may suffer considerable service downtime EECS 582 W16 18

  19. Conclusion OS-level live migration Pre-copy: iterative push and short stop-and-copy Dynamically adapting network-bandwidth - Balance service downtime and service performance degradation Paravirtualized optimizations Minimize service downtime and impact on running service EECS 582 W16 19

  20. Future Work Cluster management - Make decisions for the placement and movement of virtual machines Wide Area Network Redirection - OS will have to obtain a new IP address, or some kind of indirection layer Storage Migration - Local disks are considerably larger than volatile memory EECS 582 W16 20

  21. Q&A Thank You! EECS 582 W16 21

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#