Weird Machines in Transient Execution

 
The ghost 
is
 the machine:
Weird machines in transient execution
 
Ping-Lun Wang
, Fraser Brown, Riad S. Wahby
WOOT, 05/25 2023
What are weird machines?
 
Definition: models a system’s 
unintentional behavior
, usually triggered
from 
adversarial input
Use weird machines as 
computation primitives
Perform 
program obfuscation
 or 
secret computations
o
The 
MOV
 instruction
[1]
o
Page fault handler
 [2]
o
Time
 (TSX WM)
[3]
2
[1] S. Dolan, “mov is Turing-complete” (2013)
[2] J. Bangert et al., “The page-fault weird machine: Lessons in instruction-less computation” (WOOT 2013)
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
3
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX weird machines that compute with time
[3]
TSX Weird Gate
Input 
I
1
in cache?
Input 
I
2
in cache?
Fetch output
Out
 into cache?
 
I
1
 
I
2
 
= 1
 
= 0
4
I
1
 
 
I
2
Out
 
I
1
 
I
2
 
Out
 
I
1
 
 
I
2
 
Out
 
I
1
 
I
2
 
Out
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX weird machines that compute with time
[3]
TSX Weird Gate
Input 
I
1
in cache?
Input 
I
2
in cache?
Fetch output
Out
 into cache?
TSX weird machines that compute with time
[3]
5
 
Compute on 
cache states (
μ
Arch)
Not visible 
in architectural states!
Gates: 
Assign, AND, OR, XOR
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX Weird Gate
Input 
I
1
in cache?
Input 
I
2
in cache?
Fetch output
Out
 into cache?
6
TSX weird machines that compute with time
[3]
 
Limitations
TSX instructions are easy to detect
o
Vulnerable to static analysis!
Limited to some Intel CPUs
o
AMD and ARM CPUs are not supported!
[3] D. Evtyushkin et al., “Computing with time: Microarchitectural weird machines” (ASPLOS 2021)
TSX Weird Gate
Input 
I
1
in cache?
Input 
I
2
in cache?
Fetch output
Out
 into cache?
Transient weird machine (WM)
7
Our work: generalize TSX WM to 
Transient WM
 
TSX → 
any transient execution
primitive!
o
Prevent static analysis
o
Support AMD, ARM, and Intel
CPUs
o
95~99% accuracy on all CPUs
New 
NOT
 gate
 design
o
Idea
: 
cache state
 controls the
length of transient execution
o
Leads to a ≈7X faster 
XOR
 gate
Transient execution
primitives
TSX
Side channel
Cache
Weird gates
Assign
AND
OR
XOR
 
NOT
 
 
 
 
What is transient execution?
8
clflush(X);
TSX {
    AX /= 
0
;
    X[
0
] = 
0
;
}
 
PC →
 
(Waiting…)
 
(Speculative)
 
X
 
DIV by 0!!
▲ Transient execution (TSX)
 
(Modified!!)
 
Transient execution begins!
 
Transient
execution ends!
 
Generalize TSX WM to other primitives
 
9
clflush(X);
TSX {
    AX /= 
0
;
    X[
0
] = 
0
;
}
 
▲ Transient execution (TSX)
INIT
:
    SIGFPE {
        
goto
 EXIT;
    }
clflush(X);
AX /= 
0
;
X[
0
] = 
0
;
EXIT
:
 
▲ Transient execution (Meltdown)
INIT
:
    I[
0
] = 
0
;
    SIGFPE {
        
goto
 EXIT;
    }
clflush(Y);
AX /= 
0
;
Out[I[
0
]] = 
0
;
EXIT
:
    ...
Assign
: input latency controls output variable
10
 
I
 is in cache
 
I
 is NOT in cache
 
I
 
Out
 
I
 
Out
 
DIV by 0!!
Assign
 gate
 
Transient
exec.
 
Assign
 
DIV by 0!!
 
Transient
exec.
 
Assign
 
✔️
 
= 
0
 
= 
0
From 
Assign
 to 
OR
 & 
AND
 gates
11
Out[I
1
[
0
]] = 
0
;
Out[I
2
[
0
]] = 
0
;
Out[I
2
[I
1
[
0
]]] = 
0
;
 
OR
 gate
 
AND
 gate
INIT
:
    I[
0
] = 
0
;
    SIGFPE {
        
goto
 EXIT;
    }
clflush(Y);
AX /= 
0
;
Out[I[
0
]] = 
0
;
EXIT
:
    ...
Assign
 gate
Ho
w to construct a 
NOT
 gate
?
12
DIV by 0!!
Transient exec.
I
 in cache → fast
Assign
✔️
Assign
I
 
not
 in cache → slow
AX /= 
0
;
Out[I[
0
]] = 
0
;
Assign
 gate
Time
How to fetch output
when input is slow?
 
→ Fetch output when input is
 
not
 in cache
 
Too slow to fetch output…
I
 
not
 in cache → slow
Ho
w to construct a 
NOT
 gate
?
13
DIV by 0!!
NOT
Fetch output
I
 
in cache → fast
DIV by 0!!
✔️
Transient exec.
Transient exec.
AX /= I[
0
];
Out[tmp[
0
]] = 
0
;
NOT
 gate
Time
Insight
: transient execution time is not constant!
Slow input →
Slow transient exec.
NOT
: input latency controls 
transient execution time
14
INIT
:
    I[
0
] = O[
0
] = 
0
;
    SIGFPE {
        
goto
 EXIT;
    }
clflush(O);
clflush(tmp);
AX /= I[
0
];
Out[tmp[
0
]] = 
0
;
EXIT
:
    ...
 
I
 is in cache
 
I
 is NOT in cache
NOT
 gate
 
I
 
tmp
 
Out
 
I
 
tmp
 
Out
 
Transient
exec.
 
NOT
 
Fetch
 
tmp[
0
]
 
Fetch
 
Out[
0
]
 
DIV by 0!!
 
Division
 
Division
 
DIV by 0!!
 
✔️
 
Measuring accuracy and runtime
 
Test on 3 CPUs
o
AMD EPYC 7R13
o
ARM Graviton 1
o
Intel Xeon E7-8800
Implementation
o
C/C++ and asm
o
1,379 LoC
o
Available on 
GitHub
*
15
 
*https://github.com/joeywang4/Transient-Weird-Machine
Accuracy and runtime
Accuracy
16
Runtime (s/Mops)
Application: program obfuscation
 
Obfuscate 
computations
 and 
control flows
Future work: 
compiler
 to construct circuits of weird gates
17
if
 (input 
^
 password) {
    
fail
();
} 
else
 {
    
success
();
}
XOR
input
password
success
fail
Function pointer
Takeaways
 
TSX WMs + Meltdown & Spectre → 
Transient WMs
Novel 
NOT gate
 design
Obfuscate malware and attack exploits
18
 
Thank you for listening!
 
Github: 
https://github.com/joeywang4/Transient-Weird-Machine
Authors: Ping-Lun Wang, Fraser Brown, Riad S. Wahby
Contact: pinglunw@andrew.cmu.edu
What is transient execution?
19
INIT
:
    SIGFPE {
        
goto
 EXIT;
    }
clflush(X);
AX /= 
0
;
X[
0
] = 
0
;
EXIT
:
 
PC →
 
(Waiting…)
 
(Speculative)
 
X
 
DIV by 0!!
▲ Transient execution (Meltdown)
 
(Modified!!)
 
Transient execution begins!
 
Transient
execution ends!
 
Can Meltdown & Spectre perform computations?
 
20
raise_exception();
// the line below is never reached
access(probe_array[data * 
4096
]);
if (x < array1_size)
    
y = array2[array1[x] * 
4096
];
 
Spectre
 
Meltdown
 
Can we hide computations here?
 
Not visible in architectural states!
Encryption or decryption
Malicious code
 
Microarchitectural
 weird machines
 
21
Computing with time: Microarchitectural
weird machines (ASPLOS’21)
 
✔️
 
Can compute AND, OR, XOR and other
 
operations
✔️ 
Compute with microarchitectural states
 TSX instructions are easy to detect
 Only support some Intel CPUs
 
Q: Can we achieve the same computations
 
without TSX?
A: Yes! Any transient execution primitives are
 
possible!
Q: Can we add support to other CPUs?
A: Yes! AMD, ARM, and Intel CPUs are all
 
supported!
INIT
:
    X[
0
] = 
0
;
clflush(Y);
TSX {
    tmp /= 
0
;
    
Y[X[
0
]] = 
0
;
}
 
DIV by 0!
 
TSX gates: the 
Assign
 gate
 
22
 
PC
 
Exit TSX
 
Pipeline
 
X is 
NOT in cache
 
X is 
in cache
Y[X[
0
]] = 
0
;
Y[
0
] = 
0
;
 
X[
0
] = 
0
Y[X[
0
]] = 
0
;
 
X[
0
] = 
???
 
Fetch 
Y[
0
]
 in cache!
 
Y[
0
]
 is not fetched…
 
Assign
 gate
Slide Note
Embed
Share

Weird machines refer to models exhibiting unintentional behaviors triggered by adversarial inputs. They serve as computation primitives, enabling tasks like program obfuscation and secret computations. TSX weird machines, computing with time, manipulate cache states through gates like Assign, AND, OR, XOR, providing unique capabilities and limitations on various CPU architectures. Further research aims to generalize these concepts to support multiple CPU types and enhance security against static analysis.

  • Weird Machines
  • Transient Execution
  • Computation Primitives
  • CPU Architectures
  • Cache States

Uploaded on Sep 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The ghost is the machine: Weird machines in transient execution Ping-Lun Wang, Fraser Brown, Riad S. Wahby WOOT, 05/25 2023

  2. What are weird machines? Definition: models a system s unintentional behavior, usually triggered from adversarial input Use weird machines as computation primitives Perform program obfuscation or secret computations o The MOV instruction[1] o Page fault handler[2] o Time (TSX WM)[3] [1] S. Dolan, mov is Turing-complete (2013) [2] J. Bangert et al., The page-fault weird machine: Lessons in instruction-less computation (WOOT 2013) [3] D. Evtyushkin et al., Computing with time: Microarchitectural weird machines (ASPLOS 2021) 2

  3. TSX weird machines that compute with time[3] Input I2 in cache? Input I1 in cache? = 0 I2 short latency 1 0 Cache hit Cache miss long latency = 1 I1 TSX Weird Gate Cache Fetch output Out into cache? 3 [3] D. Evtyushkin et al., Computing with time: Microarchitectural weird machines (ASPLOS 2021)

  4. TSX weird machines that compute with time[3] Input I2 in cache? I2 Input I1 in cache? I1 I2 Out AND I1 Cache Out TSX Weird Gate I2 I1 I2 Out OR I1 Cache Fetch output Out into cache? Out 4 [3] D. Evtyushkin et al., Computing with time: Microarchitectural weird machines (ASPLOS 2021)

  5. TSX weird machines that compute with time[3] Input I2 in cache? Input I1 in cache? Compute on cache states ( Arch) Not visible in architectural states! TSX Weird Gate Gates: Assign, AND, OR, XOR Fetch output Out into cache? 5 [3] D. Evtyushkin et al., Computing with time: Microarchitectural weird machines (ASPLOS 2021)

  6. TSX weird machines that compute with time[3] Input I2 in cache? Input I1 in cache? Limitations TSX instructions are easy to detect TSX Weird Gate o Vulnerable to static analysis! Limited to some Intel CPUs o AMD and ARM CPUs are not supported! Fetch output Out into cache? 6 [3] D. Evtyushkin et al., Computing with time: Microarchitectural weird machines (ASPLOS 2021)

  7. Our work: generalize TSX WM to Transient WM TSX any transient execution primitive! Cache TSX Side channel o Prevent static analysis o Support AMD, ARM, and Intel CPUs o 95~99% accuracy on all CPUs New NOT gate design Meltdown Assign AND Spectre OR XOR NOT Transient execution primitives o Idea: cache state controls the length of transient execution o Leads to a 7X faster XOR gate Weird gates Transient weird machine (WM) 7

  8. What is transient execution? PC clflush(X); TSX { AX /= 0; X[0] = 0; } X DIV by 0!! (Waiting ) (Speculative) Transient execution ends! Cache (Modified!!) Transient execution begins! Transient execution (TSX) 8

  9. Generalize TSX WM to other primitives INIT: SIGFPE { goto EXIT; clflush(X); TSX { AX /= 0; X[0] = 0; } } clflush(X); AX /= 0; X[0] = 0; EXIT: Transient execution (TSX) Transient execution (Meltdown) 9

  10. Assign: input latency controls output variable Out INIT: I is in cache Cache I I[0] = 0; SIGFPE { goto EXIT; } Fetch I[0] Fetch Out[0] = 0 Assign DIV by 0!! Transient exec. I Out clflush(Y); AX /= 0; Out[I[0]] = 0; EXIT: ... I is NOT in cache Cache Fetch Out[0] Fetch I[0] = 0 Assign Assign gate DIV by 0!! Transient exec. 10

  11. From Assign to OR & AND gates INIT: I[0] = 0; SIGFPE { goto EXIT; } Out[I1[0]] = 0; Out[I2[0]] = 0; OR gate clflush(Y); AX /= 0; Out[I[0]] = 0; EXIT: ... Out[I2[I1[0]]] = 0; AND gate Assign gate 11

  12. How to construct a NOT gate? Fetch output when input is not in cache I in cache fast Assign How to fetch output when input is slow? AX /= 0; Out[I[0]] = 0; I not in cache slow Assign Assign gate Too slow to fetch output Transient exec. DIV by 0!! Time 12

  13. How to construct a NOT gate? Input Fetch output AX /= I[0]; Out[tmp[0]] = 0; NOT Time NOT gate I in cache fast Transient exec. DIV by 0!! I not in cache slow Transient exec. DIV by 0!! Slow input Slow transient exec. Insight: transient execution time is not constant! 13

  14. NOT: input latency controls transient execution time tmp Out INIT: I is in cache Cache I I[0] = O[0] = 0; SIGFPE { goto EXIT; } Fetch tmp[0] Fetch Out[0] NOT Transient exec. DIV by 0!! Fetch I[0] Division clflush(O); clflush(tmp); AX /= I[0]; Out[tmp[0]] = 0; EXIT: ... I tmp Out I is NOT in cache Cache Fetch tmp[0] Fetch Out[0] NOT DIV by 0!! Transient exec. Fetch I[0] Division NOT gate 14

  15. Measuring accuracy and runtime Test on 3 CPUs o AMD EPYC 7R13 o ARM Graviton 1 o Intel Xeon E7-8800 Implementation 1M operations Repeat 1,000 times Weird Gates o C/C++ and asm o 1,379 LoC o Available on GitHub* Median accuracy & runtime 15 *https://github.com/joeywang4/Transient-Weird-Machine

  16. Accuracy and runtime Gate AMD Intel ARM Gate AMD Intel ARM 99.96% 99.99% AND 1.801 2.628 32.521 AND 98.89% OR 100.00% 99.99% 99.46% OR 1.756 2.551 32.506 Assign 99.99% 99.99% 99.42% Assign 1.751 2.637 32.378 NOT 99.51% 99.99% 99.29% NOT 1.779 2.741 32.573 XOR 94.36% 95.58% 97.97% XOR 7.403 15.648 130.898 MUX 98.17% 99.93% 97.64% MUX 5.870 11.975 131.150 Accuracy Runtime (s/Mops) 16

  17. Application: program obfuscation Obfuscate computations and control flows Future work: compiler to construct circuits of weird gates input password if (input ^ password) { fail(); } else { success(); } success fail Obfuscate with Transient WM XOR MUX Function pointer 17

  18. Takeaways TSX WMs + Meltdown & Spectre Transient WMs Novel NOT gate design Obfuscate malware and attack exploits Github: https://github.com/joeywang4/Transient-Weird-Machine Authors: Ping-Lun Wang, Fraser Brown, Riad S. Wahby Contact: pinglunw@andrew.cmu.edu Thank you for listening! 18

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#