Live Migration of Virtual Machines - Overview and Challenges

Live

Migration

of

Virtual

Machines

Authors:

Christopher

Clark,

Keir

Fraser,

Steven

Hand,

Jacob

Gorm

Hansen,

Eric

Jul,

Christian

Limpach,

Ian

Pratt,

Andrew

Warfield

University

of

Cambridge

Computer

Laboratory

University

of

Copenhagen,

Denmark

Presenter:

Juncheng

Gu

EECS 582 – W16

Outline

•

Motivation

•

Design

•

Implementation

•

Evaluation

•

Conclusion

•

Future

Work

EECS 582 – W16

Motivation

What’s

VM

live

migration?

EECS 582 – W16



Move

VM

instances

across

distinct

physical

hosts

with

little

or

no

downtime

for

running

services.

•

Services

are

unaware

of

the

migration.

•

Maintain

network

connections

of

the

guest

OS.

•

VM

is

treaded

as

black

box.

Motivation

EECS 582 – W16



VM

live

migration

can

be

extremely

powerful

tool

for

cluster

administrators.

•

Hardware

Software

maintenance

upgrades

•

Load

balancing

resource

management

•

Distributed

power

management

Motivation

Why

OS-level

migration,

instead

of

process-level?

EECS 582 – W16

•

Avoid

‘

residual dependencies

’

•

Original

host

can

be

power-off

sleep

once

migration

completed.

•

Can

transfer

in-memory

state

in

consistent

and

efficient

fashion

•

E.g.

No

reconnection

for

media

streaming

application

•

Allow

separation

of

concerns

between

the

users

and

operator

of

cluster

•

Users

can

fully

control

of

the

software

and

services

within

their

VM.

•

Operators

don’t

care

about

what’s

occurring

within

the

VM.

Motivation

Related

Work

EECS 582 – W16

Design-challenges

•

Minimize

service

downtime

•

Minimize

migration

duration

•

Avoid

disrupting

running

service

EECS 582 – W16

Design-memory

migration

EECS 582 – W16

Options

•

Pre-copy

•

bounded

iterative

push

phase

very

short

stop-and-copy

phase

•

Careful

to

avoid

service

degradation

Design-local

resources

EECS 582 – W16

•

Open network connections

•

Migrating VM can keep IP and MAC address.

•

Broadcasts ARP new routing information

•

Some routers might ignore to prevent spoofing

•

A guest OS aware of migration can avoid this problem

•

Local storage

•

Network Attached Storage

Design-local

resources

EECS 582 – W16

Virtual Machine

Virtual Machine

Source

Destination

Design-overview

EECS 582 – W16

Implementation-writable

working

sets

•

Significant

overhead:

transferring

memory

pages

that

are

subsequently

modified.



Good

candidates

for

push

phase

Pages

are

seldom

or

never

modified.



Writeable

working

set

(WWS)

Pages

are

written

often,

and

should

best

be

transferred

via

stop-and-copy

•

WWS

behavior

•

WWS

varies

significantly

between

the

different

sub-benchmarks

•

Migration

results

depend

on

the

workload

and

the

precise

moment

when

migration

begins

EECS 582 – W16

Implementation-managed

self

migration

•

Managed

migration

•

Performed

by

migration

daemon

running

in

the

management

VM

•

Self

migration

•

Within

the

migratee

OS,

and

small

stub

required

on

the

destination

host

EECS 582 – W16

Implementation-track

WWS

(managed)

EECS 582 – W16

•

Using

shadow

page

table

to

track

dirty

pages

in

each

push

round

1.

Xen inserts shadow pages under the guest OS, populated using

guest OS's page tables.

2.

The shadow pages are marked read-only.

3.

If OS tries to write to a page, the resulting page fault is trapped by

Xen.

4.

Xen checks the OS's original page table and forwards the

appropriate write permission.

5.

At

the

same

time,

Xen marks the page as dirty

in

bitmap

•

At

the

beginning

of

next

push

round

•

Last

round’s

bitmap

is

copied

to

the

control

software,

Xen’s

bitmap

is

cleared.

•

Shadow

page

tables

are

destroyed

and

recreated,

all

write

permissions

are

lost

Implementation-dynamic

rate

limiting

EECS 582 – W16

More

network

bandwidth,

less

service

downtime

Less

network

bandwidth,

less

impact

on

running

service

Implementation-paravirtualized

optimizations

•

Stunning

rouge

processes

•

Rouge

process:

generate

dirty

page

at

very

high

rate

(write

one

word

in

every

page)

•

Forking

monitor

process:

monitor

the

WWS

of

individual

processes

•

If

process

exceeds

write

fault

limitation,

then

move

it

to

wait

queue

•

Freeing

page

cache

pages

•

Typically,

OS

have

number

of

free

pages

•

Using

ballooning

mechanism

to

return

free

pages

to

VMM

EECS 582 – W16

Evaluation-simple

web

server

EECS 582 – W16

•

highly

loaded

server

with

relative

small

WWS

•

Controlled

impact

on

live

services

•

Short

downtime

Migration

starts

Evaluation-rapid

page

dirtying

EECS 582 – W16

•

In

the

third

round,

the

transfer

rate

is

scaled

up

to

500Mbit/s

(max)

•

Switch

to

stop-and-copy,

resulting

in

3.5s

downtime

•

Diabolical

workload

may

suffer

considerable

service

downtime

Stop-and-copy

Conclusion

•

OS-level

live

migration

•

Pre-copy:

iterative

push

and

short

stop-and-copy

•

Dynamically

adapting

network-bandwidth

Balance

service

downtime

and

service

performance

degradation

•

Paravirtualized

optimizations

•

Minimize

service

downtime

and

impact

on

running

service

EECS 582 – W16

Future

Work

•

Cluster

management

Make

decisions

for

the

placement

and

movement

of

virtual

machines

•

Wide

Area

Network

Redirection

OS

will

have

to

obtain

new

IP

address,

or

some

kind

of

indirection

layer

•

Storage

Migration

Local

disks

are

considerably

larger

than

volatile

memory

EECS 582 – W16

Q&A

Thank

You!

EECS 582 – W16

Slide Note

Embed Share

Download

The paper discusses the concept of live migration of virtual machines, its motivations, benefits, and challenges in implementation. It covers the reasons for choosing OS-level migration over process-level migration, related works in the field, design challenges, and strategies for minimizing service downtime and migration duration. The authors delve into memory migration options and emphasize the importance of maintaining service quality during the migration process.

geel781 Follow

Uploaded on Oct 11, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Live Migration of Virtual Machines Authors: Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield University of Cambridge Computer Laboratory University of Copenhagen, Denmark Presenter: Juncheng Gu EECS 582 W16 1

Outline Motivation Design Implementation Evaluation Conclusion Future Work EECS 582 W16 2

Motivation What s VM live migration? Move VM instances across distinct physical hosts with little or no downtime for running services. Services are unaware of the migration. Maintain network connections of the guest OS. VM is treaded as a black box. EECS 582 W16 3

Motivation VM live migration can be a extremely powerful tool for cluster administrators. Hardware / Software maintenance / upgrades Load balancing / resource management Distributed power management EECS 582 W16 4

Motivation Why OS-level migration, instead of process-level? Avoid residual dependencies Original host can be power-off / sleep once migration completed. Can transfer in-memory state in a consistent and efficient fashion E.g. No reconnection for media streaming application Allow a separation of concerns between the users and operator of a cluster Users can fully control of the software and services within their VM. Operators don t care about what s occurring within the VM. EECS 582 W16 5

Motivation Related Work Approach Feature Collective project stop-and-copy Zap stop-and-copy VMotion similar with live migration Process migration residual dependencies EECS 582 W16 6

Design-challenges Minimize service downtime Minimize migration duration Avoid disrupting running service Source Host Destination Host .BI N V ML D .VS .X .VH Storage EECS 582 W16 7

Design-memory migration Options Phase service downtime migration duration push - - stop-and-copy longest shortest pull (demand) shortest longest Pre-copy a bounded iterative push phase + a very short stop-and-copy phase Careful to avoid service degradation EECS 582 W16 8

Design-local resources Open network connections Migrating VM can keep IP and MAC address. Broadcasts ARP new routing information Some routers might ignore to prevent spoofing A guest OS aware of migration can avoid this problem Local storage Network Attached Storage EECS 582 W16 9

Design-local resources Virtual Machine Virtual Machine Source Destination EECS 582 W16 10

Design-overview EECS 582 W16 11

Implementation-writable working sets Significant overhead: transferring memory pages that are subsequently modified. Good candidates for push phase Pages are seldom or never modified. Writeable working set (WWS) Pages are written often, and should best be transferred via stop-and-copy WWS behavior WWS varies significantly between the different sub-benchmarks Migration results depend on the workload and the precise moment when migration begins EECS 582 W16 12

Implementation-managed & self migration Managed migration Performed by a migration daemon running in the management VM Self migration Within the migratee OS, and a small stub required on the destination host Difference Managed Self Track WWS shadow page table + bitmap bitmap + a spare bit in PTE suspend OS to obtain a consistent checkpoint two-stage stop-and-copy, ignore page updates in last transfer Stop-and-copy EECS 582 W16 13

Implementation-track WWS (managed) Using shadow page table to track dirty pages in each push round 1. Xen inserts shadow pages under the guest OS, populated using guest OS's page tables. 2. The shadow pages are marked read-only. 3. If OS tries to write to a page, the resulting page fault is trapped by Xen. 4. Xen checks the OS's original page table and forwards the appropriate write permission. 5. At the same time, Xen marks the page as dirty in bitmap. At the beginning of next push round Last round s bitmap is copied to the control software, Xen s bitmap is cleared. Shadow page tables are destroyed and recreated, all write permissions are lost EECS 582 W16 14

Implementation-dynamic rate limiting More network bandwidth, less service downtime ! performance downtime Less network bandwidth, less impact on running service ! Dynamically adapt the bandwidth limit during each round - Set a minimum and a maximum bandwidth limit, begin with the minimum limit - ???????? ????= dirty ????current+ ???????? ????????? - ????? ???????????= ???????? ????? ????? When terminate push, and switch to stop-and-copy ? - ????? ????current> ???????? ??? - ????? ????? < ? ??? ??? EECS 582 W16 15

Implementation-paravirtualized optimizations Stunning rouge processes Rouge process: generate dirty page at a very high rate (write one word in every page) Forking a monitor process: monitor the WWS of individual processes If a process exceeds write fault limitation, then move it to wait queue Freeing page cache pages Typically, OS have a number of free pages Using ballooning mechanism to return free pages to VMM EECS 582 W16 16

Evaluation-simple web server Migration starts A highly loaded server with relative small WWS Controlled impact on live services Short downtime EECS 582 W16 17

Evaluation-rapid page dirtying Stop-and-copy In the third round, the transfer rate is scaled up to 500Mbit/s (max) Switch to stop-and-copy, resulting in 3.5s downtime Diabolical workload may suffer considerable service downtime EECS 582 W16 18

Conclusion OS-level live migration Pre-copy: iterative push and short stop-and-copy Dynamically adapting network-bandwidth - Balance service downtime and service performance degradation Paravirtualized optimizations Minimize service downtime and impact on running service EECS 582 W16 19

Future Work Cluster management - Make decisions for the placement and movement of virtual machines Wide Area Network Redirection - OS will have to obtain a new IP address, or some kind of indirection layer Storage Migration - Local disks are considerably larger than volatile memory EECS 582 W16 20

Q&A Thank You! EECS 582 W16 21

Live Migration of Virtual Machines - Overview and Challenges

Download Presentation

Presentation Transcript

Related

More Related Content