Understanding Eidetic Systems and Motivation in Technology

 
Eidetic Systems
 
David Devecsery
, Michael Chow, Xianzheng Dou,
Jason Flinn, Peter Chen
University of Michigan
 
W
h
a
t
 
i
s
 
a
n
 
E
i
d
e
t
i
c
 
S
y
s
t
e
m
?
 
Eidetic – 
Having “Perfect memory” or “Total Recall”
 
Eidetic System
 – A system which can recall and trace
           through the lineage of any past computation
 
2
 
David Devecsery
 
M
o
t
i
v
a
t
i
o
n
 
-
 
H
e
a
r
t
b
l
e
e
d
 
3
 
Was Heartbleed exploited?
What data was leaked?
 
David Devecsery
 
M
o
t
i
v
a
t
i
o
n
 
-
 
H
e
a
r
t
b
l
e
e
d
 
4
 
Heartbleed
Message
 
Leaked
Data
 
Was Heartbleed exploited? - Yes
What data was leaked?
 
David Devecsery
 
 
M
o
t
i
v
a
t
i
o
n
 
-
 
H
e
a
r
t
b
l
e
e
d
 
5
 
Was Heartbleed exploited? - Yes
What data was leaked?
 
Leaked Database
   Rows
 
David Devecsery
 
Heartbleed
Message
 
Leaked
Data
 
M
o
t
i
v
a
t
i
o
n
 
 
W
r
o
n
g
 
R
e
f
e
r
e
n
c
e
 
6
 
Bad Citation
 
How did I get the wrong citation?
 
David Devecsery
 
M
o
t
i
v
a
t
i
o
n
 
 
W
r
o
n
g
 
R
e
f
e
r
e
n
c
e
 
7
 
How did I get the wrong citation?
 
David Devecsery
 
M
o
t
i
v
a
t
i
o
n
 
 
W
r
o
n
g
 
R
e
f
e
r
e
n
c
e
 
8
 
How did I get the wrong citation?
 
David Devecsery
 
M
o
t
i
v
a
t
i
o
n
 
 
W
r
o
n
g
 
R
e
f
e
r
e
n
c
e
 
9
 
How did I get the wrong citation?
What else did this affect?
 
David Devecsery
 
M
o
t
i
v
a
t
i
o
n
 
10
 
How did I get the wrong citation?
What else did this affect?
 
 
David Devecsery
 
A
r
n
o
l
d
 
First practical eidetic computer system
Efficiently records & recalls all user-space computation
Process register/memory state
Inter-process communication
Handles lineage queries
What data was affected?
What states and outputs were affected?
Targeted towards desktop/workstation use
Reasonable overheads
Record 4 years of data on $150 commodity HD
Under 8% performance overhead on most benchmarks
 
11
 
David Devecsery
 
O
v
e
r
v
i
e
w
 
Introduction
Motivation
How Arnold remembers all state
How Arnold supports lineage queries
Conclusion
 
12
 
David Devecsery
 
R
e
m
e
m
b
e
r
i
n
g
 
S
t
a
t
e
 
Requirements:
Store years of state on a single disk
Memory/register space within a process
Inter process communication
File state
Recall any state in reasonable time
Solution:
Deterministic record & replay
“Process group” based replay
“Process graph” to track inter-process lineage
Log compression
 
13
 
David Devecsery
 
R
e
c
o
r
d
i
n
g
 
G
r
a
n
u
l
a
r
i
t
y
 
What granularity is best to record our system?
 
14
 
External
Inputs
 
David Devecsery
 
R
e
c
o
r
d
i
n
g
 
G
r
a
n
u
l
a
r
i
t
y
 
Whole system recording
Low space overhead
×
Costly to replay
 
15
 
External
Inputs
 
David Devecsery
 
R
e
c
o
r
d
i
n
g
 
G
r
a
n
u
l
a
r
i
t
y
 
Process level recording
Efficient to replay
×
Uses extra disk space
×
No Inter-process tracking
 
16
 
External
Inputs
 
David Devecsery
 
R
e
c
o
r
d
i
n
g
 
G
r
a
n
u
l
a
r
i
t
y
 
Process group recording
Efficient to replay
Reasonable disk space
×
No Inter-process tracking
 
17
 
External
Inputs
 
David Devecsery
 
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
 
P
r
o
c
e
s
s
 
G
r
a
p
h
 
18
 
Record Log
 
1
 
2
 
IPC
 
Read
 
David Devecsery
 
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
 
P
r
o
c
e
s
s
 
G
r
a
p
h
 
19
 
Record Log
 
1
 
2
 
IPC
 
Read
 
David Devecsery
 
R
e
c
o
r
d
i
n
g
 
Process group recording
            + process graph
Efficient to replay
Reasonable disk space
Inter-process tracking
 
20
 
External
Inputs
 
David Devecsery
 
S
p
a
c
e
 
O
p
t
i
m
i
z
a
t
i
o
n
s
 
21
 
David Devecsery
 
S
p
a
c
e
 
O
p
t
i
m
i
z
a
t
i
o
n
s
 
22
 
David Devecsery
 
S
p
a
c
e
 
O
p
t
i
m
i
z
a
t
i
o
n
s
 
23
 
David Devecsery
 
S
p
a
c
e
 
O
p
t
i
m
i
z
a
t
i
o
n
s
 
24
 
David Devecsery
 
4 years of data on a $150 4TB commodity HD
 
M
o
d
e
l
-
B
a
s
e
d
 
C
o
m
p
r
e
s
s
i
o
n
 
Formulate a model of a typical execution
Only record deviations from that model
   
ret_val   
=    sys_read (fd, buffer, 
count
);
 
Idea
:
 
Partial determinism
Encourage the program to conform to the model
 
 
 
25
 
David Devecsery
 
S
e
m
i
-
D
e
t
e
r
m
i
n
i
s
t
i
c
 
T
i
m
e
 
Frequent time queries are non-deterministic
Use partially deterministic clock
Real time clock
 
&
 
deterministic clock
Bound deviation
 
26
if (deterministic_clock – real_time_clock < threshold) {
 
adjust deterministic_clock
 
record deviation
}
return deterministic_clock
 
David Devecsery
 
P
e
r
f
o
r
m
a
n
c
e
 
E
v
a
l
u
a
t
i
o
n
 
27
 
David Devecsery
 
O
v
e
r
v
i
e
w
 
Introduction
Motivation
How Arnold remembers all state
How Arnold supports lineage queries
Conclusion
 
28
 
David Devecsery
 
Q
u
e
r
y
i
n
g
 
L
i
n
e
a
g
e
 
Two types of queries:
Reverse: Where did this data come from?
Forward: What did this data affect?
How does Arnold support these queries?
User specifies initial state
Trace the lineage of the computation
Intra-process tracking
Inter-process tracking
 
29
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
Use taint tracking for intra-process causality
Run retroactively, on recorded execution
Parallelizable
Arnold supports several notions of causality:
 
30
 
Copy Only
 
Data Flow
 
Data+Index Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Control Flow
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
31
 
Which linkage tool should Arnold use?
 
Inputs
 
Program
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
32
 
Data Flow
 
Data+Index
Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Copy
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
33
 
Data Flow
 
Data+Index
Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Copy
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
34
 
Data Flow
 
Data+Index
Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Copy
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
35
 
Data Flow
 
Data+Index
Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Copy
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
36
 
Data Flow
 
Data+Index
Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Copy
 
David Devecsery
 
I
n
t
r
a
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
37
 
Data Flow
 
May miss relations
 
Misses few relations
 
Recall
 
Strong input/output
         relation
 
Weak input/output
     Relation
 
Precision
 
Arnold selects the
most precise tool with
at least one result
 
David Devecsery
 
I
n
t
e
r
-
P
r
o
c
e
s
s
 
L
i
n
e
a
g
e
 
Two notions of inter-process linkage
Process graph
Tracks lineage through inter-process communication
Precise
Captures group to group communication
Human linkage
Handles relations between user inputs and outputs
Infers linkages based on data content and time
Imprecise – may have false negatives and false positives
Can capture linkages the process graph can miss
 
38
 
David Devecsery
 
E
v
a
l
u
a
t
i
o
n
 
 
W
r
o
n
g
 
R
e
f
e
r
e
n
c
e
 
39
 
Data + Index
 
Data
 
Copy
 
Copy
 
Data
 
Few false positives (font files, latex sty files, libc.so, libXt.so)
No false negatives
 
Human
Linkage
 
David Devecsery
 
E
v
a
l
u
a
t
i
o
n
 
 
H
e
a
r
t
b
l
e
e
d
 
40
 
No false positives or negatives
 
Data + Index
 
Data + Index
 
Data + Index
 
 
David Devecsery
 
C
o
n
c
l
u
s
i
o
n
 
Eidetic Systems are powerful tools
Complete vision into past computation
Answer powerful queries about state’s lineage
Arnold – First practical Eidetic System
Low runtime overhead
4 years of computation on a commodity HD
Supports powerful lineage queries
Code is released
 
https://github.com/endplay/omniplay
 
41
 
David Devecsery
Slide Note
Embed
Share

Exploring Eidetic Systems, a technology with perfect memory capabilities, developed by the team at University of Michigan. The discussion also delves into the motivation behind technology advancements, particularly focusing on incidents like Heartbleed and citation errors.


Uploaded on Mar 26, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Eidetic Systems David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, Peter Chen University of Michigan

  2. What is an Eidetic System? What is an Eidetic System? Eidetic Having Perfect memory or Total Recall Eidetic System A system which can recall and trace through the lineage of any past computation David Devecsery 2

  3. Motivation Motivation - - Heartbleed Heartbleed Was Heartbleed exploited? What data was leaked? David Devecsery 3

  4. Motivation Motivation - - Heartbleed Heartbleed Heartbleed Message Leaked Data Was Heartbleed exploited? - Yes What data was leaked? David Devecsery 4

  5. Motivation Motivation - - Heartbleed Heartbleed Leaked Database Rows Heartbleed Message Leaked Data Was Heartbleed exploited? - Yes What data was leaked? David Devecsery 5

  6. Motivation Motivation Wrong Reference Wrong Reference Bad Citation How did I get the wrong citation? David Devecsery 6

  7. Motivation Motivation Wrong Reference Wrong Reference How did I get the wrong citation? David Devecsery 7

  8. Motivation Motivation Wrong Reference Wrong Reference How did I get the wrong citation? David Devecsery 8

  9. Motivation Motivation Wrong Reference Wrong Reference How did I get the wrong citation? What else did this affect? David Devecsery 9

  10. Motivation Motivation How did I get the wrong citation? What else did this affect? David Devecsery 10

  11. Arnold Arnold First practical eidetic computer system Efficiently records & recalls all user-space computation Process register/memory state Inter-process communication Handles lineage queries What data was affected? What states and outputs were affected? Targeted towards desktop/workstation use Reasonable overheads Record 4 years of data on $150 commodity HD Under 8% performance overhead on most benchmarks David Devecsery 11

  12. Overview Overview Introduction Motivation How Arnold remembers all state How Arnold supports lineage queries Conclusion David Devecsery 12

  13. Remembering State Remembering State Requirements: Store years of state on a single disk Memory/register space within a process Inter process communication File state Recall any state in reasonable time Solution: Deterministic record & replay Process group based replay Process graph to track inter-process lineage Log compression David Devecsery 13

  14. Recording Granularity Recording Granularity External Inputs What granularity is best to record our system? David Devecsery 14

  15. Recording Granularity Recording Granularity External Inputs Whole system recording Low space overhead Costly to replay David Devecsery 15

  16. Recording Granularity Recording Granularity External Inputs Process level recording Efficient to replay Uses extra disk space No Inter-process tracking David Devecsery 16

  17. Recording Granularity Recording Granularity External Inputs Process group recording Efficient to replay Reasonable disk space No Inter-process tracking David Devecsery 17

  18. Implementation Implementation Process Graph Process Graph Record Log Read IPC 1 2 David Devecsery 18

  19. Implementation Implementation Process Graph Process Graph Record Log Read IPC 1 2 David Devecsery 19

  20. Recording Recording External Inputs Process group recording + process graph Efficient to replay Reasonable disk space Inter-process tracking David Devecsery 20

  21. Space Optimizations Space Optimizations 1.2 Log Compression vs Baseline 1 0.8 0.6 0.4 0.2 0 David Devecsery 21

  22. Space Optimizations Space Optimizations 1.2 Log Compression vs Baseline 1 0.8 0.6 0.4 411:1 0.2 Ratio 0 David Devecsery 22

  23. Space Optimizations Space Optimizations 1.2 Log Compression vs Baseline 1 0.8 0.6 6:1 0.4 Ratio 411:1 0.2 Ratio 0 David Devecsery 23

  24. Space Optimizations Space Optimizations 1.2 4 years of data on a $150 4TB commodity HD Log Compression vs Baseline 1 0.8 0.6 6:1 0.4 Ratio 411:1 0.2 Ratio 0 David Devecsery 24

  25. Model Model- -Based Compression Based Compression Formulate a model of a typical execution Only record deviations from that model ret_val = sys_read (fd, buffer, count); usually equal Idea: Partial determinism Encourage the program to conform to the model David Devecsery 25

  26. Semi Semi- -Deterministic Time Deterministic Time Frequent time queries are non-deterministic Use partially deterministic clock Real time clock & deterministic clock Bound deviation if (deterministic_clock real_time_clock < threshold) { adjust deterministic_clock record deviation } return deterministic_clock David Devecsery 26

  27. Performance Evaluation Performance Evaluation 1.4 Baseline Arnold 1.2 Normalized Runtime 1 0.8 0.6 0.4 0.2 0 David Devecsery 27

  28. Overview Overview Introduction Motivation How Arnold remembers all state How Arnold supports lineage queries Conclusion David Devecsery 28

  29. Querying Lineage Querying Lineage Two types of queries: Reverse: Where did this data come from? Forward: What did this data affect? How does Arnold support these queries? User specifies initial state Trace the lineage of the computation Intra-process tracking Inter-process tracking David Devecsery 29

  30. Intra Intra- -Process Lineage Process Lineage Use taint tracking for intra-process causality Run retroactively, on recorded execution Parallelizable Arnold supports several notions of causality: Copy Only Data Flow Data+Index Flow Control Flow Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations David Devecsery 30

  31. Intra Intra- -Process Lineage Process Lineage Inputs Program Which linkage tool should Arnold use? David Devecsery 31

  32. Intra Intra- -Process Lineage Process Lineage Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations Data+Index Flow Copy Data Flow David Devecsery 32

  33. Intra Intra- -Process Lineage Process Lineage Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations Data+Index Flow Copy Data Flow David Devecsery 33

  34. Intra Intra- -Process Lineage Process Lineage Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations Data+Index Flow Copy Data Flow David Devecsery 34

  35. Intra Intra- -Process Lineage Process Lineage Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations Data+Index Flow Copy Data Flow David Devecsery 35

  36. Intra Intra- -Process Lineage Process Lineage Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations Data+Index Flow Copy Data Flow David Devecsery 36

  37. Intra Intra- -Process Lineage Process Lineage Precision Strong input/output relation Weak input/output Relation Recall May miss relations Misses few relations Data Flow Arnold selects the most precise tool with at least one result David Devecsery 37

  38. Inter Inter- -Process Lineage Process Lineage Two notions of inter-process linkage Process graph Tracks lineage through inter-process communication Precise Captures group to group communication Human linkage Handles relations between user inputs and outputs Infers linkages based on data content and time Imprecise may have false negatives and false positives Can capture linkages the process graph can miss David Devecsery 38

  39. Evaluation Evaluation Wrong Reference Wrong Reference Data Data + Index Copy Copy Data Human Linkage Few false positives (font files, latex sty files, libc.so, libXt.so) No false negatives Record Time Replay Time Replay + Pin Time Query Time 96.1s 2.2s 70.0s 209.5s David Devecsery 39

  40. Evaluation Evaluation Heartbleed Heartbleed Data + Index Data + Index Data + Index No false positives or negatives Record Time Replay Time Replay + Pin Time Query Time 230.3s 0.4s 139.5s 235.1s David Devecsery 40

  41. Conclusion Conclusion Eidetic Systems are powerful tools Complete vision into past computation Answer powerful queries about state s lineage Arnold First practical Eidetic System Low runtime overhead 4 years of computation on a commodity HD Supports powerful lineage queries Code is released https://github.com/endplay/omniplay David Devecsery 41

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#