Practical Implementation of Embedded Shadow Page Tables for Cross-ISA System Virtual Machines

 
H
S
P
T
:
 
P
r
a
c
t
i
c
a
l
 
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
a
n
d
 
E
f
f
i
c
i
e
n
t
M
a
n
a
g
e
m
e
n
t
 
o
f
 
E
m
b
e
d
d
e
d
 
S
h
a
d
o
w
 
P
a
g
e
 
T
a
b
l
e
s
f
o
r
 
C
r
o
s
s
-
I
S
A
 
S
y
s
t
e
m
 
V
i
r
t
u
a
l
 
M
a
c
h
i
n
e
s
 
Z
h
e
 
W
a
n
g
1
,
 
J
i
a
n
j
u
n
 
L
i
1
,
 
C
h
e
n
g
g
a
n
g
 
W
u
1
 
,
 
D
o
n
g
y
a
n
 
Y
a
n
g
2
,
Zhenjiang Wang
1
,
 
Wei-Chung Hsu
3
, Bin Li
4
, Yong Guan
5
 
1
 
Institute of Computing Technology Chinese Academy of Sciences
 
1
State Key Laboratory of Computer Architecture, Institute of Computing Technology,
2
China Systems and Technology Lab, IBM,
3
Dept. Computer Science & Information Engineering, National Taiwan University,
4
Netease, 
5
College of information Engineering, Capital Normal University
 
 Outline
 
Background
Motivation
Contributions
The Framework of HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
2
 
 Outline
 
Background
Motivation
Contributions
The
 
Framework
 
of
 
HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
3
 
System Virtualization
 
System virtualization has regained its popularity in recent
years and has been widely used for cloud computing.
Allow applications running on such systems to be agnostic about
the underlying operating systems and hardware platforms.
Enable developing and testing OS and application on different
platforms. e.g. Android Emulator on a PC (ARM to x64)
 
 
 
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
4
The Category of System Virtualization
 
System virtualization can be divided into 
same-ISA
 and
cross-ISA
 categories depending on whether the guest
and the host are of different instruction-set architecture.
 
Same-ISA system virtualization is commonly used for
server consolidation.
Example: VMware Workstation, VirtualBox.
 
Cross-ISA system virtualization is also important and
commonplace.
Android Emulator emulates the Android/ARM environment on the
x86-64 platforms is yet another example, it offers great conveni-
ence in development and debugging to Android application deve-
lopers.
Institute of Computing Technology Chinese Academy of Sciences
5
We only focus on Cross-ISA system
virtualization.
The Overhead of Cross-ISA System Virt.
 
Virtualization brings an additional layer of abstraction and
causes some unavoidable overhead.
 
Memory virtualization overhead is one major overhead.
Memory subsystem emulation in QEMU system mode takes about
23%~43% time during the execution. [ESPT, 
VEE’14
]
All hardware functions are emulated by software.
 
 
So optimizations to minimize such memory virtualization
overhead are the key to enhance the performance of the
Cross-ISA system-level emulator.
Institute of Computing Technology Chinese Academy of Sciences
6
Traditional Memory Virtualization in Cross-ISA
Institute of Computing Technology Chinese Academy of Sciences
7
 
P
1
P
2
guest virtual address
      (GVA)
guest physical address
(GPA)
host virtual address
      
(HVA)
host physical address
(HPA)
1
2
3
Guest OS
Host OS
P
4
0
4G
0
2
64
-1
0
1G
0
4G
SGPS
P
3
 
Address translation
 
SGPS
: Simulated
Guest Physical Space
 
Address Page
 
Disadvantage
Each guest memory access instruction will go through these three address translations.
Despite of software-TLB, the overhead is still huge.
Host OS
P
N
Embedded Shadow Page Table
 
Embedded Shadow Page Tables (ESPTs) has been propo-
sed to reduce address translations and improve the perf-
ormance.[ESPT, 
VEE’14
]
 
ESPT utilizes the larger address space on modern 64-bit
processors and creates a loadable kernel module (LKM)
to embed the 
shadow page entries 
into the host page ta-
ble.
 
Those shadow page table (SPT) entries are used to store
the mapping between guest virtual address and host ph-
ysical address.
This table can be used by the hardware walker.
Institute of Computing Technology Chinese Academy of Sciences
8
Embedded Shadow Page Table (cont.)
 
Hardware can walk the shadow page table directly to accelerate guest memory access.
ESPT uses 
LKMs
 to create shadow page mapping (mapped G1 to P4).
It also proposed 
a signal notification mechanism 
to reduce overhead of creating shadow page
mapping.
It 
intercepted the “TLB flush” instruction 
to reduce synchronization overhead.
Institute of Computing Technology Chinese Academy of Sciences
9
 
P
1
P
2
guest virtual address
      (GVA)
guest physical address
(GPA)
host virtual address
      
(HVA)
host physical address
(HPA)
2
Guest OS
P
4
0
4G
0
2
64
-1
0
0
4G
SGPS
P
3
Host OS
4G
GDVAS
G
1
1G
1
Embedded
Shadow
Page Entry
SGPS
: Simulated
Guest Physical Space
GDVAS
: Guest
Dedicated Virtual
Address Space
Embedded Shadow Page Table (cont.)
Support for Multi-Process
ESPT maintains a shadow page table for each guest process,
when the guest process switches, ESPT will use 
LKMs
 to set the
host directory page table base pointer for the lower 4G space to
the targeted shadow page table.
Institute of Computing Technology Chinese Academy of Sciences
10
Embedded
Shadow
Page Entry
...
Guest Process Switching
 
Switch Process A to
Process B
Switch Process B to
Process N
 
 Outline
 
Background
Motivation
Contributions
The Framework of HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
11
Motivation
 
ESPT can significantly reduce the address translation over
head. However, ESPT has a few drawbacks due to LKM.
 
Using LKMs is less desirable for system virtual machines
due  to portability, security and maintainability concerns.
Most of LKMs use the internal kernel interface and 
different kernel
versions may have different interfaces
.
To enforce security, modern OS only allows the user who has 
root
privilege
 to load LKMs.
Using LKMs, the kernel would be 
less secure
.
For example, for the Linux kernel, many kernel exploits have been reported,
and  often these exploits attack LKMs instead of the core kernel [
APSys ’11
].
 
So we proposed a different implementation to manage
ESPTs without using LKMs.
Institute of Computing Technology Chinese Academy of Sciences
12
 
 Outline
 
Background
Motivation
Contributions
The Framework of HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
13
 
Contributions
 
Proposed a practical implementation of ESPT without
using loadable kernel modules.
 
Proposed an efficient synchronization mechanism based
on shared memory mapping methods.
 
Proposed and evaluated three SPT organizations.
 
Our approach has achieved up to 92% speedup for CINT
2006 benchmarks and 44% improvement for the Android
system boot and practical applications start-up on the
Android emulator.
 
Institute of Computing Technology Chinese Academy of Sciences
 
14
 
 Outline
 
Background
Motivation
Contributions
The Framework of HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
15
Challenges of ESPT without using LKMs
 
ESPT only uses a LKM to help to complete two operations.
Creating shadow page mapping.
Because shadow page table should be walked by hardware MMU directly, so it
must be placed in host kernel space and at kernel privilege.
Handling guest multi-process.
By modifying the host page table, ESPT complete switching shadow page
table.
 
Our challenges are how to complete these two operations
without using LKMs.
 
 
To distinguish our new implementation from the original ESPT, we call
our approach 
“Hosted Shadow Page Table” (HSPT).
Institute of Computing Technology Chinese Academy of Sciences
16
Challenge1: Creating Shadow Page Mapping
 
Tradtional address translation:                          . HSPT:
What we need to do is to make G1 to share P4 mapped from P3.
So we use shared memory mechanism which maps two or more virtual
 
pages to the  same
physical page to accomplish this shared operation
 
(
“mmap” system call
).
operation.
Institute of Computing Technology Chinese Academy of Sciences
17
Host OS
 
P
1
P
2
guest virtual address
      (GVA)
guest physical address
(GPA)
host virtual address
      
(HVA)
host physical address
(HPA)
P
4
0
4G
0
2
64
-1
0
0
4G
SGPS
P
3
4G
GDVAS
G
1
1G
Hosted
Shadow
Page Entry
SGPS
: Simulated
Guest Physical Space
GDVAS
: Guest
Dedicated Virtual
Address Space
F
ile
F
1
 
offset
 
offset
MAP_SHARED
 
P1
 
->P2
 
->P3
 
->P4
 
P1->G1
 
P4
 
?
Creating Shadow Page Mapping
 
The time of creating shadow page mapping (
similar to ESPT
).
We do not create mappings for all SPT entries. Instead, we only
set the page protection value of all the entries. Thus when a
shadow page entry is accessed at the first  time, the SIGSEGV will
be raised and we have the chance to create the mapping.
 
 
We also intercept the guest “TLB flush” instruction to
monitor the modifications of guest page table and update
the SPT (
similar to ESPT
).
Institute of Computing Technology Chinese Academy of Sciences
18
Challenge2: Handling Guest Multi-process
 
By using 
LKM
, ESPT can easily switch  the corresponding
shadow page table (SPT) when guest switch the process
by modifying the host page table. So the different SPT can
use the same host virtual address.
Without using LKM, different shadow page table must use different
host virtual space (each SPT is bound to a separate GDVAS).
 
So we investigate three variations of SPT organizations.
Shared SPT
All guest processes shared the same shadow page table.
Private SPT
Each guest process has its own shadow page table.
Group Shared SPT
It is a combination of the Shared SPT and the Private SPT.
Institute of Computing Technology Chinese Academy of Sciences
19
Shared SPT
All guest processes shared the same shadow page table.
Institute of Computing Technology Chinese Academy of Sciences
20
Host User Space
Host Kernel Space
Process A execute routine
Process B execute routine
 
1. First time access page P1
2. Receive SIGSEGV,create
G1 mapping
 
 
5. Access page P1
6. Receive SIGSEGV, create
     G1 mapping
 
 
 
 
3. First time access page P2
4. Receive SIGSEGV,create
     G1 mapping
Time
2
5
 
switch
 A to B
 
switch
 B to A
Empty shadow page table by “mmap” GDVAS with none prot
6
1
4
3
7
 
Private SPT
 
In the Shared SPT strategy, when a process is switched
back again, the information of the filled SPT entries in
the last timeslot is lost and the SPT of the switched-in
process has to be warmed up again.
 
So each guest process has its own shadow page table is
a good solution.
 
But there is a problem that how to monitor the switched-
out guest page tables (GPTs).
Setting write-protection to the switched-out GPTs is a common
but expensive method.
 
Institute of Computing Technology Chinese Academy of Sciences
 
21
 
Private SPT (cont.)
 
 
Consider x86 and ARM, as an example, they use PCID
(Process Context Identifier) and ASID (Address Space
Identifier) respectively to identify TLB entries for each pr-
ocess.
We call this kind of identifier as “Context Identifier” (CID).
When guest modifies the switched-out process’s page table, TLB
must be informed with CID and the address.
 
 
 
Base on this principle, we choose to bind each SPT with a
CID rather than the process ID.
 
Institute of Computing Technology Chinese Academy of Sciences
 
22
Private SPT (cont.)
 
Guest os switch CID A to CID B (process switching)
gs segment register is used to store current base of GDVAS.
Institute of Computing Technology Chinese Academy of Sciences
23
HVA
0
2
64
-1
 
Base_A
 
Base_B
GDVAS_A
GDVAS_B
 
CID
Table
0
1
A
B
N-1
Process 2
0
4G
Process …
0
4G
 
Guest OS manage CID
Context Identifier
Host User Space
Host Kernel Space
 
   Point to current CID
 
The number of  GDVAS is N
Host Page
Table
SPT_B
SPT_A
 
Segment Register gs:
Base_A 
 Base_B
Group Shared SPT
 
Private SPT consumes too much host virtual space.
Taking ARM as the guest, a virtual space size of 256*4G=1TB
(upto 256 different processes allowed) would be needed.
Institute of Computing Technology Chinese Academy of Sciences
24
 
Group Shared SPT (cont.)
 
With a given number of SPTs, if the number of processes
is relatively small, each process can obtain its own SPT,
thus enjoy the full benefit of Private SPT.
 
 
When the number of processes increases beyond the nu-
mber of SPTs, some processes must share a SPT.
 
 
So Group Shared SPT works adaptively to balance betw-
een high performance and limited virtual address space.
 
Institute of Computing Technology Chinese Academy of Sciences
 
25
 
 Outline
 
Background
Motivation
Contributions
The Framework of HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
26
 
Experiment Setting
 
Experiment platform
Host Machine: Intel E74807 machine with 1064MHZ, 15G RAM
Host OS: Ubuntu 12.04.3 LTS(x86-64).
Guest OS: Android 4.4 (Linux Kernel: 3.4.0-gd853d22nnk, ARM)
 
Bechmark
SPEC CINT2006
 
Comparison
Tradtional Memory Virtualization
Hosted Shadow Page Table
Shared SPT
Private SPT
Group Shared SPT
 
Institute of Computing Technology Chinese Academy of Sciences
 
27
 
Shared SPT/Private SPT
 
Shared SPT has one SPT. Private SPT has 256 SPTs.
 
Institute of Computing Technology Chinese Academy of Sciences
 
28
Group Shared SPT
 
We test the performance when the number of SPTs is 1, 4, 8, 16 and 32.
 
 
 
 
 
 
 
 
 
 
 
 
 
We can see obviously the performance keeps improving with the increasing
number of SPTs and when the number of SPTs exceeds 8, the performance
of Group Shared SPT is very close to Private SPT.
Institute of Computing Technology Chinese Academy of Sciences
29
 
Disscussion
 
We did not compare the performance of HSPT side-by-side
with ESPT.
 
 
Our work does not claim HSPT will yield greater performan
ce than ESPT.
 
 
It is motivated for better platform portability, higher system
security, and improved usability for application developers
since non-root users can also benefit from HSPT technology.
 
Institute of Computing Technology Chinese Academy of Sciences
 
30
 
 Outline
 
Background
Motivation
Contributions
The Framework of HSPT
Evaluation
Conclusion
 
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
31
 
Conclusion
 
We proposed an practical implementation of SPT for cross
-ISA virtual machines without using LKMs.
 
Our approach uses part of the host page table as SPT and
rely on the shared memory mapping schemes to update
SPT, thus avoid the use of LKMs.
 
We Proposed and evaluated three SPT organizations to
handle multi-processing in guest OS.
 
 
With sufficient host virtual space, our approach has
achieved up to 92% speedup for CINT2006 benchmarks.
 
 
 
Institute of Computing Technology Chinese Academy of Sciences
 
32
 
 
Thank You
 
Institute of Computing Technology Chinese Academy of Sciences
 
33
Slide Note
Embed
Share

This research focuses on the practical implementation and efficient management of embedded shadow page tables for cross-ISA system virtual machines. It discusses the framework, evaluation, and conclusions regarding system virtualization, particularly addressing memory virtualization overhead and optimizations for enhancing performance. The study explores the significance of cross-ISA virtualization in allowing applications to run across different instruction-set architectures seamlessly and its relevance in cloud computing and software development.

  • Embedded Shadow Page Tables
  • System Virtualization
  • Cross-ISA
  • Memory Virtualization
  • Performance Optimization

Uploaded on Sep 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. HSPT: Practical Implementation and Efficient Management of Embedded Shadow Page Tables for Cross-ISA System Virtual Machines Zhe Wang1, Jianjun Li1, Chenggang Wu1 , Dongyan Yang2, Zhenjiang Wang1, Wei-Chung Hsu3, Bin Li4, Yong Guan5 1State Key Laboratory of Computer Architecture, Institute of Computing Technology, 2China Systems and Technology Lab, IBM, 3Dept. Computer Science & Information Engineering, National Taiwan University, 4Netease, 5College of information Engineering, Capital Normal University Institute of Computing Technology Chinese Academy of Sciences 1

  2. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 2

  3. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 3

  4. System Virtualization System virtualization has regained its popularity in recent years and has been widely used for cloud computing. Allow applications running on such systems to be agnostic about the underlying operating systems and hardware platforms. Enable developing and testing OS and application on different platforms. e.g. Android Emulator on a PC (ARM to x64) App App OS OS Hypervisor Institute of Computing Technology Chinese Academy of Sciences 4

  5. The Category of System Virtualization System virtualization can be divided into same-ISA and cross-ISA categories depending on whether the guest and the host are of different instruction-set architecture. Same-ISA system virtualization is commonly used for server consolidation. Example: VMware Workstation, VirtualBox. virtualization. We only focus on Cross-ISA system Cross-ISA system virtualization is also important and commonplace. Android Emulator emulates the Android/ARM environment on the x86-64 platforms is yet another example, it offers great conveni- ence in development and debugging to Android application deve- lopers. Institute of Computing Technology Chinese Academy of Sciences 5

  6. The Overhead of Cross-ISA System Virt. Virtualization brings an additional layer of abstraction and causes some unavoidable overhead. Memory virtualization overhead is one major overhead. Memory subsystem emulation in QEMU system mode takes about 23%~43% time during the execution. [ESPT, VEE 14] All hardware functions are emulated by software. So optimizations to minimize such memory virtualization overhead are the key to enhance the performance of the Cross-ISA system-level emulator. Institute of Computing Technology Chinese Academy of Sciences 6

  7. Traditional Memory Virtualization in Cross-ISA Guest OS 0 4G guest virtual address (GVA) P 1 Memory Mapping Table Page Entry (GPA->HVA) Address translation Guest Page Table 2 1 SGPS: Simulated Guest Physical Space Page Entry (GVA->GPA) 0 1G guest physical address (GPA) P 2 P N Address Page 264-1 0 host virtual address (HVA) SGPSP 3 Host Page Table 3 Emulator Page Entry (HVA- >HPA) 0 4G host physical address (HPA) P 4 Host OS Disadvantage Each guest memory access instruction will go through these three address translations. Despite of software-TLB, the overhead is still huge. Host OS Institute of Computing Technology Chinese Academy of Sciences 7

  8. Embedded Shadow Page Table Embedded Shadow Page Tables (ESPTs) has been propo- sed to reduce address translations and improve the perf- ormance.[ESPT, VEE 14] ESPT utilizes the larger address space on modern 64-bit processors and creates a loadable kernel module (LKM) to embed the shadow page entries into the host page ta- ble. Those shadow page table (SPT) entries are used to store the mapping between guest virtual address and host ph- ysical address. This table can be used by the hardware walker. Institute of Computing Technology Chinese Academy of Sciences 8

  9. Embedded Shadow Page Table (cont.) Guest OS 0 4G SGPS: Simulated Guest Physical Space guest virtual address (GVA) P 1 Address Mapping GDVAS: Guest Dedicated Virtual Address Space 0 1G 1 guest physical address (GPA) P 2 Address translation 264-1 0 4G host virtual address (HVA) G 1 SGPSP GDVAS 3 Host Page Table Embedded Shadow Page Entry 2 Shadow Page Table 0 4G host physical address (HPA) P 4 Emulator Page Entry Host OS Hardware can walk the shadow page table directly to accelerate guest memory access. ESPT uses LKMs to create shadow page mapping (mapped G1 to P4). It also proposed a signal notification mechanism to reduce overhead of creating shadow page mapping. It intercepted the TLB flush instruction to reduce synchronization overhead. Institute of Computing Technology Chinese Academy of Sciences 9

  10. Embedded Shadow Page Table (cont.) Support for Multi-Process ESPT maintains a shadow page table for each guest process, when the guest process switches, ESPT will use LKMs to set the host directory page table base pointer for the lower 4G space to the targeted shadow page table. Shadow Page Table A Guest Process Switching Host Page Table Embedded Shadow Page Entry Switch Process A to Process B Switch Process B to Process N Shadow Page Table B ... Emulator Page Entry Shadow Page Table N Institute of Computing Technology Chinese Academy of Sciences 10

  11. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 11

  12. Motivation ESPT can significantly reduce the address translation over head. However, ESPT has a few drawbacks due to LKM. Using LKMs is less desirable for system virtual machines due to portability, security and maintainability concerns. Most of LKMs use the internal kernel interface and different kernel versions may have different interfaces. To enforce security, modern OS only allows the user who has root privilege to load LKMs. Using LKMs, the kernel would be less secure. For example, for the Linux kernel, many kernel exploits have been reported, and often these exploits attack LKMs instead of the core kernel [APSys 11]. So we proposed a different implementation to manage ESPTs without using LKMs. Institute of Computing Technology Chinese Academy of Sciences 12

  13. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 13

  14. Contributions Proposed a practical implementation of ESPT without using loadable kernel modules. Proposed an efficient synchronization mechanism based on shared memory mapping methods. Proposed and evaluated three SPT organizations. Our approach has achieved up to 92% speedup for CINT 2006 benchmarks and 44% improvement for the Android system boot and practical applications start-up on the Android emulator. Institute of Computing Technology Chinese Academy of Sciences 14

  15. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 15

  16. Challenges of ESPT without using LKMs ESPT only uses a LKM to help to complete two operations. Creating shadow page mapping. Because shadow page table should be walked by hardware MMU directly, so it must be placed in host kernel space and at kernel privilege. Handling guest multi-process. By modifying the host page table, ESPT complete switching shadow page table. Our challenges are how to complete these two operations without using LKMs. To distinguish our new implementation from the original ESPT, we call our approach Hosted Shadow Page Table (HSPT). Institute of Computing Technology Chinese Academy of Sciences 16

  17. Challenge1: Creating Shadow Page Mapping ? Tradtional address translation: . HSPT: What we need to do is to make G1 to share P4 mapped from P3. So we use shared memory mechanism which maps two or more virtual pages to the same physical page to accomplish this shared operation ( mmap system call). operation. guest virtual address (GVA) P1->P2->P3->P4 P1->G1 P4 0 4G SGPS: Simulated Guest Physical Space P 1 GDVAS: Guest Dedicated Virtual Address Space 0 1G guest physical address (GPA) P 2 MAP_SHARED 264-1 0 4G host virtual address (HVA) G 1 SGPSP GDVAS 3 Host Page Table Hosted Shadow Page Entry offset F 1 File 0 4G host physical address (HPA) P 4 offset Emulator Page Entry Host OS Institute of Computing Technology Chinese Academy of Sciences 17

  18. Creating Shadow Page Mapping The time of creating shadow page mapping (similar to ESPT). We do not create mappings for all SPT entries. Instead, we only set the page protection value of all the entries. Thus when a shadow page entry is accessed at the first time, the SIGSEGV will be raised and we have the chance to create the mapping. We also intercept the guest TLB flush instruction to monitor the modifications of guest page table and update the SPT (similar to ESPT). Institute of Computing Technology Chinese Academy of Sciences 18

  19. Challenge2: Handling Guest Multi-process By using LKM, ESPT can easily switch the corresponding shadow page table (SPT) when guest switch the process by modifying the host page table. So the different SPT can use the same host virtual address. Without using LKM, different shadow page table must use different host virtual space (each SPT is bound to a separate GDVAS). So we investigate three variations of SPT organizations. Shared SPT All guest processes shared the same shadow page table. Private SPT Each guest process has its own shadow page table. Group Shared SPT It is a combination of the Shared SPT and the Private SPT. Institute of Computing Technology Chinese Academy of Sciences 19

  20. Shared SPT All guest processes shared the same shadow page table. Process A execute routine 1. First time access page P1 2. Receive SIGSEGV,create G1 mapping 2 A to B Process B execute routine Time 1 switch 3. First time access page P2 4. Receive SIGSEGV,create G1 mapping 5 4 switch B to A 5. Access page P1 6. Receive SIGSEGV, create G1 mapping 7 0 4G 0 4G P 1 P 2 Process A Process B GVA Host User Space Host Kernel Space G 1 GDVAS HVA 0 base base+4G 264-1 3 6 Empty shadow page table by mmap GDVAS with none prot Institute of Computing Technology Chinese Academy of Sciences 20

  21. Private SPT In the Shared SPT strategy, when a process is switched back again, the information of the filled SPT entries in the last timeslot is lost and the SPT of the switched-in process has to be warmed up again. So each guest process has its own shadow page table is a good solution. But there is a problem that how to monitor the switched- out guest page tables (GPTs). Setting write-protection to the switched-out GPTs is a common but expensive method. Institute of Computing Technology Chinese Academy of Sciences 21

  22. Private SPT (cont.) Consider x86 and ARM, as an example, they use PCID (Process Context Identifier) and ASID (Address Space Identifier) respectively to identify TLB entries for each pr- ocess. We call this kind of identifier as Context Identifier (CID). When guest modifies the switched-out process s page table, TLB must be informed with CID and the address. Base on this principle, we choose to bind each SPT with a CID rather than the process ID. Institute of Computing Technology Chinese Academy of Sciences 22

  23. Private SPT (cont.) 0 4G 0 4G 0 4G 0 4G Process 2 GVA Process 0 Process 1 Process Guest OS manage CID Point to current CID CID Table0 1 A B N-1 Context Identifier Host User Space Host Kernel Space Base_A Base_B 0 264-1 Host Page Table HVA GDVAS_B GDVAS_A SPT_B The number of GDVAS is N SPT_A Guest os switch CID A to CID B (process switching) gs segment register is used to store current base of GDVAS. X64 Instruction ARM Instruction Segment Register gs: Base_A Base_B ldr R0, [fp] mov gs:esi, eax translate Institute of Computing Technology Chinese Academy of Sciences 23

  24. Group Shared SPT Private SPT consumes too much host virtual space. Taking ARM as the guest, a virtual space size of 256*4G=1TB (upto 256 different processes allowed) would be needed. 0 4G Process 1 4G 0 4G 0 Process .. Process 0 GVA CID Table0 1 A B N-1 GDVAS Pool Manager (use LRU Algorithm) M<=N Host User Space Host Kernel Space GDVAS Table 0 M-1 GDVAS_0 HVA GDVAS_M-1 0 Base_M-1 Base_0 264-1 The number of GDVAS is M Institute of Computing Technology Chinese Academy of Sciences 24

  25. Group Shared SPT (cont.) With a given number of SPTs, if the number of processes is relatively small, each process can obtain its own SPT, thus enjoy the full benefit of Private SPT. When the number of processes increases beyond the nu- mber of SPTs, some processes must share a SPT. So Group Shared SPT works adaptively to balance betw- een high performance and limited virtual address space. Institute of Computing Technology Chinese Academy of Sciences 25

  26. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 26

  27. Experiment Setting Experiment platform Host Machine: Intel E74807 machine with 1064MHZ, 15G RAM Host OS: Ubuntu 12.04.3 LTS(x86-64). Guest OS: Android 4.4 (Linux Kernel: 3.4.0-gd853d22nnk, ARM) Bechmark SPEC CINT2006 Comparison Tradtional Memory Virtualization Hosted Shadow Page Table Shared SPT Private SPT Group Shared SPT Institute of Computing Technology Chinese Academy of Sciences 27

  28. Shared SPT/Private SPT Shared SPT has one SPT. Private SPT has 256 SPTs. Institute of Computing Technology Chinese Academy of Sciences 28

  29. Group Shared SPT We test the performance when the number of SPTs is 1, 4, 8, 16 and 32. We can see obviously the performance keeps improving with the increasing number of SPTs and when the number of SPTs exceeds 8, the performance of Group Shared SPT is very close to Private SPT. Institute of Computing Technology Chinese Academy of Sciences 29

  30. Disscussion We did not compare the performance of HSPT side-by-side with ESPT. Our work does not claim HSPT will yield greater performan ce than ESPT. It is motivated for better platform portability, higher system security, and improved usability for application developers since non-root users can also benefit from HSPT technology. Institute of Computing Technology Chinese Academy of Sciences 30

  31. Outline Background Motivation Contributions The Framework of HSPT Evaluation Conclusion Institute of Computing Technology Chinese Academy of Sciences 31

  32. Conclusion We proposed an practical implementation of SPT for cross -ISA virtual machines without using LKMs. Our approach uses part of the host page table as SPT and rely on the shared memory mapping schemes to update SPT, thus avoid the use of LKMs. We Proposed and evaluated three SPT organizations to handle multi-processing in guest OS. With sufficient host virtual space, our approach has achieved up to 92% speedup for CINT2006 benchmarks. Institute of Computing Technology Chinese Academy of Sciences 32

  33. Thank You Institute of Computing Technology Chinese Academy of Sciences 33

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#