Insights into Meltdown and Spectre Design Flaws

On the Meltdown & Spectre Design Flaws
Mark D. Hill
Computer Sciences Dept.
Univ. of Wisconsin-Madison
 
February 2018 
Computer Architect, Not Security
Expert 
Prepared while on a sabbatical visit to Google with public information only and
representing the author’s views only, not necessarily Google’s.
 
Talk Info (Hidden Slide)
 
T
i
t
l
e
:
 
O
n
 
t
h
e
 
M
e
l
t
d
o
w
n
 
&
 
S
p
e
c
t
r
e
 
D
e
s
i
g
n
 
F
l
a
w
s
S
p
e
a
k
e
r
:
 
M
a
r
k
 
D
.
 
H
i
l
l
,
 
C
o
m
p
u
t
e
r
 
S
c
i
e
n
c
e
s
 
D
e
p
a
r
t
m
e
n
t
,
 
U
n
i
v
e
r
s
i
t
y
 
o
f
 
W
i
s
c
o
n
s
i
n
-
M
a
d
i
s
o
n
A
b
s
t
r
a
c
t
:
 
T
w
o
 
m
a
j
o
r
 
h
a
r
d
w
a
r
e
 
s
e
c
u
r
i
t
y
 
d
e
s
i
g
n
 
f
l
a
w
s
-
-
d
u
b
b
e
d
 
M
e
l
t
d
o
w
n
 
a
n
d
 
S
p
e
c
t
r
e
-
-
w
e
r
e
 
b
r
o
a
d
l
y
 
r
e
v
e
a
l
e
d
 
t
o
 
t
h
e
 
p
u
b
l
i
c
 
i
n
 
e
a
r
l
y
 
J
a
n
u
a
r
y
 
2
0
1
8
 
i
n
 
r
e
s
e
a
r
c
h
 
p
a
p
e
r
s
 
a
n
d
 
b
l
o
g
 
p
o
s
t
s
 
t
h
a
t
r
e
q
u
i
r
e
 
c
o
n
s
i
d
e
r
a
b
l
e
 
e
x
p
e
r
t
i
s
e
 
a
n
d
 
e
f
f
o
r
t
 
t
o
 
u
n
d
e
r
s
t
a
n
d
.
 
T
o
 
c
o
m
p
l
e
m
e
n
t
 
t
h
e
s
e
,
 
t
h
i
s
 
t
a
l
k
 
s
e
e
k
s
 
t
o
 
g
i
v
e
 
a
 
g
e
n
e
r
a
l
 
c
o
m
p
u
t
e
r
 
s
c
i
e
n
c
e
 
a
u
d
i
e
n
c
e
 
t
h
e
 
g
i
s
t
 
o
f
 
t
h
e
s
e
 
s
e
c
u
r
i
t
y
 
f
l
a
w
s
 
a
n
d
 
t
h
e
i
r
i
m
p
l
i
c
a
t
i
o
n
s
.
 
T
h
e
 
g
o
a
l
 
i
s
 
t
o
 
e
n
a
b
l
e
 
t
h
e
 
a
u
d
i
e
n
c
e
 
c
a
n
 
e
i
t
h
e
r
 
s
t
o
p
 
t
h
e
r
e
 
o
r
 
h
a
v
e
 
a
 
f
r
a
m
e
w
o
r
k
 
t
o
 
l
e
a
r
n
 
m
o
r
e
.
 
A
 
n
o
n
-
g
o
a
l
 
i
s
 
e
x
p
l
o
r
i
n
g
 
m
a
n
y
 
d
e
t
a
i
l
s
 
o
f
 
f
l
a
w
 
e
x
p
l
o
i
t
a
t
i
o
n
 
a
n
d
 
p
a
t
c
h
 
s
t
a
t
u
s
,
 
i
n
p
a
r
t
,
 
b
e
c
a
u
s
e
 
t
h
e
 
s
p
e
a
k
e
r
 
i
s
 
a
 
c
o
m
p
u
t
e
r
 
a
r
c
h
i
t
e
c
t
,
 
n
o
t
 
a
 
s
e
c
u
r
i
t
y
 
e
x
p
e
r
t
.
I
n
 
p
a
r
t
i
c
u
l
a
r
,
 
t
h
i
s
 
t
a
l
k
 
r
e
v
i
e
w
s
 
t
h
a
t
 
C
o
m
p
u
t
e
r
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
 
(
t
h
e
 
v
e
r
s
i
o
n
 
n
u
m
b
e
r
 
i
s
 
n
e
w
)
 
s
p
e
c
i
f
i
e
s
 
t
h
e
 
t
i
m
i
n
g
-
i
n
d
e
p
e
n
d
e
n
t
 
f
u
n
c
t
i
o
n
a
l
 
b
e
h
a
v
i
o
r
 
o
f
 
a
 
c
o
m
p
u
t
e
r
 
a
n
d
 
m
i
c
r
o
-
a
r
c
h
i
t
e
c
t
u
r
e
 
t
h
a
t
i
s
 
t
h
e
 
s
e
t
 
o
f
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
 
t
e
c
h
n
i
q
u
e
s
 
t
h
a
t
 
i
m
p
r
o
v
e
 
p
e
r
f
o
r
m
a
n
c
e
 
b
y
 
m
o
r
e
 
t
h
a
n
 
1
0
0
x
.
 
I
t
 
t
h
e
n
 
a
s
k
s
,
 
W
h
a
t
 
i
f
 
a
 
c
o
m
p
u
t
e
r
 
t
h
a
t
 
i
s
 
c
o
m
p
l
e
t
e
l
y
 
c
o
r
r
e
c
t
 
b
y
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
 
c
a
n
 
b
e
 
m
a
d
e
 
t
o
l
e
a
k
 
p
r
o
t
e
c
t
e
d
 
i
n
f
o
r
m
a
t
i
o
n
 
v
i
a
 
t
i
m
i
n
g
,
 
a
.
k
.
a
.
,
 
m
i
c
r
o
-
a
r
c
h
i
t
e
c
t
u
r
e
?
 
T
h
e
 
a
n
s
w
e
r
 
i
s
 
t
h
a
t
 
t
h
i
s
 
e
x
a
c
t
l
y
 
w
h
a
t
 
i
s
 
d
o
n
e
 
b
y
 
t
h
e
 
M
e
l
t
d
o
w
n
 
a
n
d
 
S
p
e
c
t
r
e
 
d
e
s
i
g
n
 
f
l
a
w
s
.
 
M
e
l
t
d
o
w
n
 
l
e
a
k
s
 
k
e
r
n
e
l
m
e
m
o
r
y
,
 
b
u
t
 
s
o
f
t
w
a
r
e
 
&
 
h
a
r
d
w
a
r
e
 
f
i
x
e
s
 
e
x
i
s
t
.
 
S
p
e
c
t
r
e
 
l
e
a
k
s
 
m
e
m
o
r
y
 
o
u
t
s
i
d
e
 
o
f
 
s
a
n
d
b
o
x
e
s
 
a
n
d
 
b
o
u
n
d
s
 
c
h
e
c
k
,
 
a
n
d
 
i
t
 
i
s
 
s
c
a
r
y
.
 
A
n
 
i
m
p
l
i
c
a
t
i
o
n
 
i
s
 
t
h
a
t
 
t
h
e
 
d
e
f
i
n
i
t
i
o
n
 
o
f
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
-
-
t
h
e
m
o
s
t
 
i
m
p
o
r
t
a
n
t
 
i
n
t
e
r
f
a
c
e
 
b
e
t
w
e
e
n
 
s
o
f
t
w
a
r
e
 
a
n
d
 
h
a
r
d
w
a
r
e
-
-
i
s
 
i
n
a
d
e
q
u
a
t
e
 
t
o
 
p
r
o
t
e
c
t
 
i
n
f
o
r
m
a
t
i
o
n
.
 
I
t
 
i
s
 
t
i
m
e
 
f
o
r
 
e
x
p
e
r
t
s
 
f
r
o
m
 
m
u
l
t
i
p
l
e
 
v
i
e
w
p
o
i
n
t
s
 
t
o
 
c
o
m
e
 
t
o
g
e
t
h
e
r
 
t
o
 
c
r
e
a
t
e
 
A
r
c
h
i
t
e
c
t
u
r
e
2
.
0
)
.
B
i
o
:
 
M
a
r
k
 
D
.
 
H
i
l
l
 
(
h
t
t
p
:
/
/
w
w
w
.
c
s
.
w
i
s
c
.
e
d
u
/
~
m
a
r
k
h
i
l
l
)
 
i
s
 
J
o
h
n
 
P
.
 
M
o
r
g
r
i
d
g
e
 
P
r
o
f
e
s
s
o
r
 
a
n
d
 
G
e
n
e
 
M
.
 
A
m
d
a
h
l
 
P
r
o
f
e
s
s
o
r
 
o
f
 
C
o
m
p
u
t
e
r
 
S
c
i
e
n
c
e
s
 
a
t
 
t
h
e
 
U
n
i
v
e
r
s
i
t
y
 
o
f
 
W
i
s
c
o
n
s
i
n
-
M
a
d
i
s
o
n
.
 
H
i
l
l
h
a
s
 
a
 
P
h
D
 
i
n
 
c
o
m
p
u
t
e
r
 
s
c
i
e
n
c
e
 
f
r
o
m
 
t
h
e
 
U
n
i
v
e
r
s
i
t
y
 
o
f
 
C
a
l
i
f
o
r
n
i
a
,
 
B
e
r
k
e
l
e
y
.
 
H
i
l
l
s
 
r
e
s
e
a
r
c
h
 
t
a
r
g
e
t
s
 
c
o
m
p
u
t
e
r
 
d
e
s
i
g
n
 
a
n
d
 
e
v
a
l
u
a
t
i
o
n
.
 
H
e
 
h
a
s
 
m
a
d
e
 
c
o
n
t
r
i
b
u
t
i
o
n
s
 
t
o
 
p
a
r
a
l
l
e
l
 
c
o
m
p
u
t
e
r
s
y
s
t
e
m
 
d
e
s
i
g
n
 
(
e
.
g
.
,
 
m
e
m
o
r
y
 
c
o
n
s
i
s
t
e
n
c
y
 
m
o
d
e
l
s
 
a
n
d
 
c
a
c
h
e
 
c
o
h
e
r
e
n
c
e
)
,
 
m
e
m
o
r
y
 
s
y
s
t
e
m
 
d
e
s
i
g
n
 
(
c
a
c
h
e
s
 
a
n
d
 
t
r
a
n
s
l
a
t
i
o
n
 
b
u
f
f
e
r
s
)
,
 
c
o
m
p
u
t
e
r
 
s
i
m
u
l
a
t
i
o
n
 
(
p
a
r
a
l
l
e
l
 
s
y
s
t
e
m
s
 
a
n
d
 
m
e
m
o
r
y
s
y
s
t
e
m
s
)
,
 
s
o
f
t
w
a
r
e
 
(
e
.
g
.
,
 
p
a
g
e
 
t
a
b
l
e
s
 
a
n
d
 
c
a
c
h
e
-
c
o
n
s
c
i
o
u
s
 
o
p
t
i
m
i
z
a
t
i
o
n
s
)
,
 
d
e
t
e
r
m
i
n
i
s
t
i
c
 
r
e
p
l
a
y
 
a
n
d
 
t
r
a
n
s
a
c
t
i
o
n
a
l
 
m
e
m
o
r
y
.
 
F
o
r
 
e
x
a
m
p
l
e
,
 
h
e
 
i
s
 
t
h
e
 
i
n
v
e
n
t
o
r
 
o
f
 
t
h
e
 
w
i
d
e
l
y
-
u
s
e
d
 
3
C
 
m
o
d
e
l
o
f
 
c
a
c
h
e
 
b
e
h
a
v
i
o
r
 
(
c
o
m
p
u
l
s
o
r
y
,
 
c
a
p
a
c
i
t
y
,
 
a
n
d
 
c
o
n
f
l
i
c
t
 
m
i
s
s
e
s
)
 
a
n
d
 
c
o
-
i
n
v
e
n
t
o
r
 
o
f
 
t
h
e
 
c
o
r
n
e
r
s
t
o
n
e
 
f
o
r
 
t
h
e
 
C
+
+
 
a
n
d
 
J
a
v
a
 
m
u
l
t
i
-
t
h
r
e
a
d
e
d
 
m
e
m
o
r
y
 
s
p
e
c
i
f
i
c
a
t
i
o
n
s
 
(
s
e
q
u
e
n
t
i
a
l
 
c
o
n
s
i
s
t
e
n
c
y
 
f
o
r
d
a
t
a
-
r
a
c
e
-
f
r
e
e
 
p
r
o
g
r
a
m
s
)
.
 
H
e
 
i
s
 
a
 
f
e
l
l
o
w
 
o
f
 
I
E
E
E
 
a
n
d
 
t
h
e
 
A
C
M
.
 
H
e
 
s
e
r
v
e
s
 
a
s
 
V
i
c
e
 
C
h
a
i
r
 
o
f
 
t
h
e
 
C
o
m
p
u
t
e
r
 
C
o
m
m
u
n
i
t
y
 
C
o
n
s
o
r
t
i
u
m
 
(
2
0
1
6
-
1
8
)
 
a
n
d
 
s
e
r
v
e
d
 
a
s
 
W
i
s
c
o
n
s
i
n
 
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
s
 
D
e
p
a
r
t
m
e
n
t
 
C
h
a
i
r
 
2
0
1
4
-
2
0
1
7
.
Executive Summary
Architecture 1.0: 
the timing-independent functional behavior of a computer
Micro-architecture: 
the implementation techniques to improve performance
Question: What if a computer that is completely correct by 
Architecture 1.0
can be made to leak protected information via timing, a.k.a., 
Micro-Architecture
?
I
m
p
l
i
c
a
t
i
o
n
:
 
T
h
e
 
d
e
f
i
n
i
t
i
o
n
 
o
f
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
 
i
s
 
i
n
a
d
e
q
u
a
t
e
 
t
o
 
p
r
o
t
e
c
t
 
i
n
f
o
r
m
a
t
i
o
n
 
Outline
 
 
Computer Architecture & Micro-Architecture Background
Timing Side-Channel Attack
Meltdown
Spectre
Wrap-Up
Computer Architecture 0.0 -- Pre-1964
Software Lagged Hardware
Each new machine design was different
Software needed to be rewritten in assembly/machine language
Unimaginable today
Going forward: Need to separate HW interface from implementation
 
Each Computer was New
Implemented machine (has mass) → hardware
Instructions for hardware (no mass) → software
Computer Architecture 1.0 -- Born 1964
IBM System 360 defined an 
instruction set architecture
 
Stable interface across a family of implementations
Software did NOT have to be rewritten
 
Architecture 1.0: the timing-independent functional behavior of a computer
Micro-architecture: implementation techniques that change timing to go fast
branch (R1 >= bound) goto error
load R2 ← memory[
train+R1
]
and R3 ← R2 && 0xffff
load R4 ← memory[
save+SIZE+R3
]
Note: The code is not IBM 360 assembly, but is the example used later.
Micro-architecture Harvested Moore’s Law Bounty
For decades, every ~2 years: 2x transistors, 1.4x faster & 1x chip power possible;
2300 transistors for Intel 4004 → millions per core & billions for caches
(Micro-)architects took this ever doubling budget to make each processor core
execute > 100x than what it would otherwise.
Key techniques w/ tutorial next:
Instruction Speculation
Hardware Caching
Hidden by Architecture 1.0: timing-independent functional behavior unchanged
Instruction Speculation Tutorial
Many steps (cycles) to execute one instruction; time flows left to right →
add
Predict direction: target or fall thru
Go Faster: Pipelining, branch prediction, & instruction speculation
load
S
p
e
c
u
l
a
t
i
o
n
 
c
o
r
r
e
c
t
:
 
C
o
m
m
i
t
 
a
r
c
h
i
t
e
c
t
u
r
a
l
 
c
h
a
n
g
e
s
 
o
f
 
a
n
d
 
(
r
e
g
i
s
t
e
r
)
 
&
 
s
t
o
r
e
 
(
m
e
m
o
r
y
)
 
g
o
 
f
a
s
t
!
Mis-speculate: Abort 
architectural
 changes (
registers, memory
); go in other branch direction
Hardware Caching Tutorial
Main Memory (DRAM) 1000x too slow
Add Hardware Cache(s): small, transparent hardware memory
Like a software cache: speculate near-term reuse (locality) is common
Like a hash table: an item (block or line) can go in one or few slots
E.g., 4-entry cache w/ slot picked with address (key) modulo 4
 
12
?
Miss
Insert 
12
07
?
Miss
Insert 
07
12
?
HIT!
No
changes
16
?
Miss
Victim 
12
Insert 
16
Note 
12
victimized
“early” due
to “alias”
 
Micro-architecture Harvested Moore’s Law Bounty
 
For decades, every ~2 years: 2x transistors, 1.4x faster & 1x chip power possible;
2300 transistors for Intel 4004 → millions per core & billions for caches
(Micro-)architects took this ever doubling budget to make each processor core
execute > 100x what it would otherwise
 
 
 
Hidden by Architecture 1.0: timing-independent functional behavior unchanged
 
branch (R1 >= bound) goto error 
; Speculate branch not taken
load R2 ← memory[
train+R1
]     
; Speculate load & speculate cache hit
and R3 ← R2 && 0xffff          
; Speculate AND
load R4 ← memory[
save+SIZE+R3
] 
; Speculate load & speculate cache hit
 
Whither Computer Architecture 1.0?
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
:
 
t
i
m
i
n
g
-
i
n
d
e
p
e
n
d
e
n
t
 
f
u
n
c
t
i
o
n
a
l
 
b
e
h
a
v
i
o
r
Q
u
e
s
t
i
o
n
:
 
W
h
a
t
 
i
f
 
a
 
c
o
m
p
u
t
e
r
 
t
h
a
t
 
i
s
 
c
o
m
p
l
e
t
e
l
y
 
c
o
r
r
e
c
t
 
b
y
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
c
a
n
 
b
e
 
m
a
d
e
 
t
o
 
l
e
a
k
 
p
r
o
t
e
c
t
e
d
 
i
n
f
o
r
m
a
t
i
o
n
 
v
i
a
 
t
i
m
i
n
g
,
 
a
.
k
.
a
.
,
 
m
i
c
r
o
-
a
r
c
h
i
t
e
c
t
u
r
e
?
I
m
p
l
i
c
a
t
i
o
n
:
 
T
h
e
 
d
e
f
i
n
i
t
i
o
n
 
o
f
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
 
i
s
 
i
n
a
d
e
q
u
a
t
e
 
t
o
 
p
r
o
t
e
c
t
 
i
n
f
o
r
m
a
t
i
o
n
T
h
i
s
 
i
s
 
w
h
a
t
 
M
e
l
t
d
o
w
n
 
a
n
d
 
S
p
e
c
t
r
e
 
d
o
.
 
L
e
t
'
s
 
s
e
e
 
w
h
y
 
a
n
d
 
e
x
p
l
o
r
e
 
i
m
p
l
i
c
a
t
i
o
n
s
.
Side-Channel Attack: 
SAVE
 Secret in Micro-Arch
 
1.
Prime micro-architectural state
a.
Repeatedly access array 
train[]
to train branch predictor to expect access 
< bound
b.
Access all of array 
save[]
to put it completely in a cache of size 
SIZE
2.
Coerce processor into 
speculatively
 
executing
 instructions that will be nullified
to (a) find a secret & (b) save it in micro-architecture
 
branch (R1 >= bound) goto error 
; Speculate not taken even if R1 >= bound
load R2 ← memory[train+R1]     
; Speculate to find SECRET outside of train[]
and R3 ← R2 && 0xffff          
; Speculate to convert SECRET bits into index
load R4 ← memory[save+SIZE+R3] 
; Speculate to save SECRET by victimizing
memory[save+R3] since it aliases in cache with new access memory[save+SIZE+R3]
3.    HW detects 
mis-speculation
 
   
Undoes 
architectural 
changes
 
    Leaves cache (
micro-architecture)
 
changes (correct by 
Architecture 1.0
)
Side-Channel Attack: 
RECALL
 Secret from Micro-Arch
4: Probe 
time
 to access each element of 
save[]
--
micro-architectural 
property;
If accessing 
save[foo]
 slow due to cache miss, then SECRET is 
foo
. A leak!
5: Repeat many times to obtain secret information at some bandwidth. (More
shifting/masking needed to get all SECRET bits victimizing 64B cache lines)
Well-known in 1983/85 DoD “Orange Book”
Covert timing channels include all vehicles that would allow one process to signal
information to another process by modulating its own use of system resources in
such a way that the change in response time observed by the second process would
provide information. --TRUSTED COMPUTER SYSTEM EVALUATION CRITERIA
With roots back to 1974 TENEX password attack
But seemed fanciful
 Spy vs. Spy, Mad Magazine, 1960
 
Meltdown (
https://meltdownattack.com/meltdown.pdf
)
 
Can leak the contents of kernel memory at up to 500KB/s
 
TRAP!! (not branch)
Under mis-
speculation
Meltdown & Hardware
Demonstrated for many Intel x86-64 cores; NOT demonstrated for AMD
Key: When to suppress load with protection violation (user load to kernel memory)
EARLY: AMD appears to suppress early, e.g., at TLB access
LATE: Intel appears to suppress at end after micro-arch state changes
My SWAG (Scientific Wild A** Guess) Why
Both are correct by Architecture 1.0
Performance shouldn’t matter as this case is supposed to be rare
Do what’s easiest & have luck that is good (AMD) or bad (Intel)
Meltdown & Software
Bad: Meltdown operates with bug-free OS software (by 
Architecture 1.0
)
Good: Major commercial OSs patched for Meltdown ~January 2018
Idea: Don’t map (much) of protected kernel address space in user process
Offending load now fails address translation & does nothing
Patches quickly derived from KAISER developed for side-channel attacks of
Kernel Address Space Layout Randomization (KASLR)
Performance impact 0-30% syscall frequency & core model.
Future hardware can fix Meltdown (like AMD) so maybe we dodged a bullet
 
Spectre (
https://spectreattack.com/spectre.pdf
)
 
Classic side-channel attack w/ deep micro-arch info
1. Attacker primes micro-architecture
E.g, branch predictor or branch target buffer for saving secret
E.g., cache for recalling secret
2: Victim loads secret under mis-speculation
Load should NOT trap (unlike Meltdown)
Still inappropriate if managed language or sandbox
3: Victim saves secret in micro-arch state, e.g., cache
4: Attacker recalls secret from micro-arch state; 4: repeat.
Spectre Applicability (Paper Sections 4, 5, & 6)
4.
 
Exploit branch mis-prediction to let Javascript steal from Chrome browser
Demonstrated Intel Haswell/Skylake, AMD Ryzen, & several ARM cores
Many other existing designs vulnerable
 5. 
 
Used indirect branches & return-oriented programming to mis-train
branch target buffer to obtain information from different hyper-thread on same
core
 6. 
 
Many other known timing-channel exist, e.g., register file contention,
functional unit occupancy, 
but what about unknown timing channels? 
 
Spectre Mitigation (Section 7)
Branch prediction
SW: Suppress branch prediction “when important” with 
mfence
, etc.
Not defined to work but appears to work--at a performance cost
HW could auto-magically suppress branch prediction when appropriate (???)
Branch Target Buffer
SW: Not clear. Disable hyper-threading, etc.?
HW: Make micro-architecture state private to thread (not core or processor)
More generally: Hard to mitigate threats NOT YET DEFINED.
 
Need Computer Architecture 2.0?
 
With Meltdown & Spectre, 
Architecture 1.0
 is inadequate to protect information
Augment 
Architecture 1.0 
with 
Architecture 2.0
 specification of
(Abstraction of) time-visible micro-architecture?
Bandwidth of known (unknown?) timing channels?
Enforced limits on user software behavior? (c.f., KAISER)
Change 
Microarchitecture
 to mitigate timing channel bandwidth
Suppress some speculation
Undo most changes on mis-speculation
Can this be (formally) solved or must it be managed like crime?
Need Computer Architecture 2.0?
More generally, can we reduce our dependence on SPECULATION?
Accelerators!! GPU, DSP, IPU, TPU, ... [Hennessy & Patterson 2018 Taxonomy]
Dedicated Memories
More ALUs
Easy Parallelism
Lower precision data
Domain Specific Language
Speculation NOT a first-
order feature!
In 2005, Arvind said Speculation (w/ von Neumann model) killed Dataflow
After 2018, Dataflow-like Renaissance w/ Sea of Accelerators?
 
Executive Summary
 
Architecture 1.0: 
the timing-independent functional behavior of a computer
Micro-architecture: 
the implementation techniques to improve performance
Question: What if a computer that is completely correct by 
Architecture 1.0
can be made to leak protected information via timing, a.k.a., 
Micro-Architecture
?
 
 
I
m
p
l
i
c
a
t
i
o
n
:
 
T
h
e
 
d
e
f
i
n
i
t
i
o
n
 
o
f
 
A
r
c
h
i
t
e
c
t
u
r
e
 
1
.
0
 
i
s
 
i
n
a
d
e
q
u
a
t
e
 
t
o
 
p
r
o
t
e
c
t
 
i
n
f
o
r
m
a
t
i
o
n
 
M
e
l
t
d
o
w
n
 
l
e
a
k
s
 
k
e
r
n
e
l
m
e
m
o
r
y
,
 
b
u
t
 
s
o
f
t
w
a
r
e
 
&
h
a
r
d
w
a
r
e
 
f
i
x
e
s
 
e
x
i
s
t
 
S
p
e
c
t
r
e
 
l
e
a
k
s
 
m
e
m
o
r
y
o
u
t
s
i
d
e
 
o
f
 
b
o
u
n
d
s
 
c
h
e
c
k
s
 
o
r
s
a
n
d
b
o
x
e
s
,
 
a
n
d
 
i
s
 
s
c
a
r
y
 
Some References
 
New York Times: 
https://www.nytimes.com/2018/01/03/business/computer-flaws.html
Meltdown paper: 
https://meltdownattack.com/meltdown.pdf
Spectre paper: 
https://spectreattack.com/spectre.pdf
A blog separating the two bugs: 
https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/
Google Blog: 
https://security.googleblog.com/2018/01/todays-cpu-vulnerability-what-you-need.html
 and
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
Industry News Sources: 
https://arstechnica.com/gadgets/2018/01/whats-behind-the-intel-design-flaw-forcing-numerous-
patches/
 and 
https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/
 
Backup Slides
 
 
Spectre Code Example
 
L
i
s
t
i
n
g
 
2
:
 
E
x
p
l
o
i
t
i
n
g
 
S
p
e
c
u
l
a
t
i
v
e
 
E
x
e
c
u
t
i
o
n
 
v
i
a
 
J
a
v
a
S
c
r
i
p
t
1 
if 
(index < simpleByteArray.length) {
2  index = simpleByteArray[index | 0];
3  index = (((index * TABLE1_STRIDE)|0) & (TABLE1_BYTES-1))|0;
4  localJunk ^= probeTable[index|0]|0;
5}
L
i
s
t
i
n
g
 
3
:
 
D
i
s
a
s
s
e
m
b
l
y
 
o
f
 
S
p
e
c
u
l
a
t
i
v
e
 
E
x
e
c
u
t
i
o
n
 
i
n
 
L
i
s
t
i
n
g
 
2
 
J
a
v
a
S
c
r
i
p
t
1 cmpl r15,[rbp-0xe0] ; Compare index (r15) against simpleByteArray.length
2 jnc 0x24dd099bb870 ; If index >= length, branch to instruction after move below
3 REX.W leaq rsi,[r12+rdx*1] ; Set rsi=r12+rdx=addr of first byte in simpleByteArray
4 movzxbl rsi,[rsi+r15*1] ; Read byte from address rsi+r15 (= base address+index)
5 shll rsi, 12 ; Multiply rsi by 4096 by shifting left 12 bits}\%\
6 andl rsi,0x1ffffff ; AND reassures JIT that next operation is in-bounds
7 movzxbl rsi,[rsi+r8*1] ; Read from probeTable
8 xorl rsi,rdi ; XOR the read result onto localJunk
9 REX.W movq rdi,rsi ; Copy localJunk into rdi
 
 
 
 
 
 
Meltdown v. Spectre
 
Miessler Blog (
https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/
 )
Slide Note
Embed
Share

Delve into the flaws of Meltdown and Spectre, where a computer can leak protected information via timing, despite being correct by Architecture 1.0. Explore the implications and necessary fixes to enhance information security in computer architecture and micro-architecture.

  • Computer Architecture
  • Meltdown
  • Spectre
  • Information Security
  • Design Flaws

Uploaded on Feb 27, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. On the Meltdown & Spectre Design Flaws Mark D. Hill Computer Sciences Dept. Univ. of Wisconsin-Madison February 2018 Computer Architect, Not Security Expert representing the author s views only, not necessarily Google s. Prepared while on a sabbatical visit to Google with public information only and

  2. Executive Summary Architecture 1.0: the timing-independent functional behavior of a computer Micro-architecture: the implementation techniques to improve performance Question: What if a computer that is completely correct by Architecture 1.0 can be made to leak protected information via timing, a.k.a., Micro-Architecture? Meltdown leaks kernel memory, but software & hardware fixes exist Spectre leaks memory outside of bounds checks or sandboxes, and is scary Implication: The definition of Architecture 1.0 is inadequate to protect information

  3. Outline Computer Architecture & Micro-Architecture Background Timing Side-Channel Attack Meltdown Spectre Wrap-Up

  4. Computer Architecture 0.0 -- Pre-1964 Each Computer was New Implemented machine (has mass) hardware Instructions for hardware (no mass) software Software Lagged Hardware Each new machine design was different Software needed to be rewritten in assembly/machine language Unimaginable today Going forward: Need to separate HW interface from implementation

  5. Computer Architecture 1.0 -- Born 1964 IBM System 360 defined an instruction set architecture branch (R1 >= bound) goto error load R2 memory[train+R1] and R3 R2 && 0xffff load R4 memory[save+SIZE+R3] Stable interface across a family of implementations Software did NOT have to be rewritten Architecture 1.0: the timing-independent functional behavior of a computer Micro-architecture: implementation techniques that change timing to go fast Note: The code is not IBM 360 assembly, but is the example used later.

  6. Micro-architecture Harvested Moores Law Bounty For decades, every ~2 years: 2x transistors, 1.4x faster & 1x chip power possible; 2300 transistors for Intel 4004 millions per core & billions for caches (Micro-)architects took this ever doubling budget to make each processor core execute > 100x than what it would otherwise. Key techniques w/ tutorial next: Instruction Speculation Hardware Caching Hidden by Architecture 1.0: timing-independent functional behavior unchanged

  7. Instruction Speculation Tutorial Many steps (cycles) to execute one instruction; time flows left to right add load Go Faster: Pipelining, branch prediction, & instruction speculation add load Predict direction: target or fall thru branch and Speculate! store Speculate more! Speculation correct: Commit architectural changes of and (register) & store (memory) go fast! Mis-speculate: Abort architectural changes (registers, memory); go in other branch direction

  8. Hardware Caching Tutorial Main Memory (DRAM) 1000x too slow Add Hardware Cache(s): small, transparent hardware memory Like a software cache: speculate near-term reuse (locality) is common Like a hash table: an item (block or line) can go in one or few slots E.g., 4-entry cache w/ slot picked with address (key) modulo 4 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 12? Miss Insert 12 07? Miss Insert 07 12? HIT! No 16? Miss Note 12 victimized early due to alias -- -- -- -- 12 -- -- -- 12 -- -- 07 12 -- -- 07 16 -- -- 07 Victim 12 Insert 16 changes

  9. Micro-architecture Harvested Moores Law Bounty For decades, every ~2 years: 2x transistors, 1.4x faster & 1x chip power possible; 2300 transistors for Intel 4004 millions per core & billions for caches (Micro-)architects took this ever doubling budget to make each processor core execute > 100x what it would otherwise branch (R1 >= bound) goto error ; Speculate branch not taken load R2 memory[train+R1] ; Speculate load & speculate cache hit and R3 R2 && 0xffff ; Speculate AND load R4 memory[save+SIZE+R3] ; Speculate load & speculate cache hit Hidden by Architecture 1.0: timing-independent functional behavior unchanged

  10. Whither Computer Architecture 1.0? Architecture 1.0: timing-independent functional behavior Question: What if a computer that is completely correct by Architecture 1.0 can be made to leak protected information via timing, a.k.a., micro-architecture? Implication: The definition of Architecture 1.0 is inadequate to protect information This is what Meltdown and Spectre do. Let's see why and explore implications.

  11. Side-Channel Attack: SAVE Secret in Micro-Arch 1. Prime micro-architectural state a. Repeatedly access array train[]to train branch predictor to expect access < bound b. Access all of array save[]to put it completely in a cache of size SIZE 2. Coerce processor into speculatively executing instructions that will be nullified to (a) find a secret & (b) save it in micro-architecture branch (R1 >= bound) goto error ; Speculate not taken even if R1 >= bound load R2 memory[train+R1] ; Speculate to find SECRET outside of train[] and R3 R2 && 0xffff ; Speculate to convert SECRET bits into index load R4 memory[save+SIZE+R3] ; Speculate to save SECRET by victimizing memory[save+R3] since it aliases in cache with new access memory[save+SIZE+R3] 3. HW detects mis-speculation Undoes architectural changes Leaves cache (micro-architecture) changes (correct by Architecture 1.0)

  12. Side-Channel Attack: RECALL Secret from Micro-Arch 4: Probe time to access each element of save[]--micro-architectural property; If accessing save[foo] slow due to cache miss, then SECRET is foo. A leak! 5: Repeat many times to obtain secret information at some bandwidth. (More shifting/masking needed to get all SECRET bits victimizing 64B cache lines) Well-known in 1983/85 DoD Orange Book Covert timing channels include all vehicles that would allow one process to signal information to another process by modulating its own use of system resources in such a way that the change in response time observed by the second process would provide information. --TRUSTED COMPUTER SYSTEM EVALUATION CRITERIA With roots back to 1974 TENEX password attack But seemed fanciful Spy vs. Spy, Mad Magazine, 1960

  13. Meltdown (https://meltdownattack.com/meltdown.pdf) Can leak the contents of kernel memory at up to 500KB/s TRAP!! (not branch) Under mis- speculation

  14. Meltdown & Hardware Demonstrated for many Intel x86-64 cores; NOT demonstrated for AMD Key: When to suppress load with protection violation (user load to kernel memory) EARLY: AMD appears to suppress early, e.g., at TLB access LATE: Intel appears to suppress at end after micro-arch state changes My SWAG (Scientific Wild A** Guess) Why Both are correct by Architecture 1.0 Performance shouldn t matter as this case is supposed to be rare Do what s easiest & have luck that is good (AMD) or bad (Intel)

  15. Meltdown & Software Bad: Meltdown operates with bug-free OS software (by Architecture 1.0) Good: Major commercial OSs patched for Meltdown ~January 2018 Idea: Don t map (much) of protected kernel address space in user process Offending load now fails address translation & does nothing Patches quickly derived from KAISER developed for side-channel attacks of Kernel Address Space Layout Randomization (KASLR) Performance impact 0-30% syscall frequency & core model. Future hardware can fix Meltdown (like AMD) so maybe we dodged a bullet

  16. Spectre (https://spectreattack.com/spectre.pdf) Classic side-channel attack w/ deep micro-arch info 1. Attacker primes micro-architecture E.g, branch predictor or branch target buffer for saving secret E.g., cache for recalling secret 2: Victim loads secret under mis-speculation Load should NOT trap (unlike Meltdown) Still inappropriate if managed language or sandbox 3: Victim saves secret in micro-arch state, e.g., cache 4: Attacker recalls secret from micro-arch state; 4: repeat.

  17. Spectre Applicability (Paper Sections 4, 5, & 6) 4. Exploit branch mis-prediction to let Javascript steal from Chrome browser Demonstrated Intel Haswell/Skylake, AMD Ryzen, & several ARM cores Many other existing designs vulnerable 5. branch target buffer to obtain information from different hyper-thread on same core Used indirect branches & return-oriented programming to mis-train 6. functional unit occupancy, but what about unknown timing channels? Many other known timing-channel exist, e.g., register file contention,

  18. Spectre Mitigation (Section 7) Branch prediction SW: Suppress branch prediction when important with mfence, etc. Not defined to work but appears to work--at a performance cost HW could auto-magically suppress branch prediction when appropriate (???) Branch Target Buffer SW: Not clear. Disable hyper-threading, etc.? HW: Make micro-architecture state private to thread (not core or processor) More generally: Hard to mitigate threats NOT YET DEFINED.

  19. Need Computer Architecture 2.0? With Meltdown & Spectre, Architecture 1.0 is inadequate to protect information Augment Architecture 1.0 with Architecture 2.0 specification of (Abstraction of) time-visible micro-architecture? Bandwidth of known (unknown?) timing channels? Enforced limits on user software behavior? (c.f., KAISER) Change Microarchitecture to mitigate timing channel bandwidth Suppress some speculation Undo most changes on mis-speculation Can this be (formally) solved or must it be managed like crime?

  20. Need Computer Architecture 2.0? More generally, can we reduce our dependence on SPECULATION? Accelerators!! GPU, DSP, IPU, TPU, ... [Hennessy & Patterson 2018 Taxonomy] Dedicated Memories More ALUs Easy Parallelism Lower precision data Domain Specific Language Yavits et al. MultiAmdahl, 2017 Speculation NOT a first- order feature! In 2005, Arvind said Speculation (w/ von Neumann model) killed Dataflow After 2018, Dataflow-like Renaissance w/ Sea of Accelerators?

  21. Executive Summary Architecture 1.0: the timing-independent functional behavior of a computer Micro-architecture: the implementation techniques to improve performance Question: What if a computer that is completely correct by Architecture 1.0 can be made to leak protected information via timing, a.k.a., Micro-Architecture? Meltdown leaks kernel memory, but software & hardware fixes exist Spectre leaks memory outside of bounds checks or sandboxes, and is scary Implication: The definition of Architecture 1.0 is inadequate to protect information

  22. Some References New York Times: https://www.nytimes.com/2018/01/03/business/computer-flaws.html Meltdown paper: https://meltdownattack.com/meltdown.pdf Spectre paper: https://spectreattack.com/spectre.pdf A blog separating the two bugs: https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/ Google Blog: https://security.googleblog.com/2018/01/todays-cpu-vulnerability-what-you-need.html and https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html Industry News Sources: https://arstechnica.com/gadgets/2018/01/whats-behind-the-intel-design-flaw-forcing-numerous- patches/ and https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/

  23. Backup Slides

  24. Spectre Code Example Listing 2: Exploiting Speculative Execution via JavaScript 1 if (index < simpleByteArray.length) { 2 index = simpleByteArray[index | 0]; 3 index = (((index * TABLE1_STRIDE)|0) & (TABLE1_BYTES-1))|0; 4 localJunk ^= probeTable[index|0]|0; 5} Listing 3: Disassembly of Speculative Execution in Listing 2 JavaScript 1 cmpl r15,[rbp-0xe0] ; Compare index (r15) against simpleByteArray.length 2 jnc 0x24dd099bb870 ; If index >= length, branch to instruction after move below 3 REX.W leaq rsi,[r12+rdx*1] ; Set rsi=r12+rdx=addr of first byte in simpleByteArray 4 movzxbl rsi,[rsi+r15*1] ; Read byte from address rsi+r15 (= base address+index) 5 shll rsi, 12 ; Multiply rsi by 4096 by shifting left 12 bits}\%\ 6 andl rsi,0x1ffffff ; AND reassures JIT that next operation is in-bounds 7 movzxbl rsi,[rsi+r8*1] ; Read from probeTable 8 xorl rsi,rdi ; XOR the read result onto localJunk 9 REX.W movq rdi,rsi ; Copy localJunk into rdi

  25. Meltdown v. Spectre Miessler Blog (https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/ )

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#