An Improved Algorithm for Slicing Machine Code

An Improved Algorithm for
Slicing Machine Code
Venkatesh Srinivasan
Thomas Reps
{venk, reps}@cs.wisc.edu
1
Slicing
 
 
 
Backward slicing
 
 
Forward slicing
2
Computes the set of program points that might affect/be affected 
by 
slicing criterion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
v
o
i
d
 
s
u
m
_
p
r
o
d
(
 
 
 
 
 
 
i
n
t
 
n
 
 
 
 
)
 
{
 
 
 
 
 
 
i
n
t
 
s
u
m
 
=
 
0
;
 
 
 
 
 
 
i
n
t
 
p
r
o
d
 
=
 
1
;
 
 
 
 
 
 
w
h
i
l
e
 
(
n
-
-
)
 
{
 
 
 
 
 
 
 
 
p
r
o
d
 
*
=
 
n
;
 
 
 
 
 
 
 
 
s
u
m
 
+
=
 
n
;
 
 
 
 
 
 
}
 
 
 
 
 
 
p
r
i
n
t
f
(
%
d
\
n
,
 
s
u
m
)
;
 
 
 
 
 
 
p
r
i
n
t
f
(
%
d
\
n
,
 
p
r
o
d
)
;
 
 
 
 
}
Slicing
criterion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
v
o
i
d
 
s
u
m
_
p
r
o
d
(
 
 
 
 
 
 
i
n
t
 
n
 
 
 
 
)
 
{
 
 
 
 
 
 
i
n
t
 
s
u
m
 
=
 
0
;
 
 
 
 
 
 
i
n
t
 
p
r
o
d
 
=
 
1
;
 
 
 
 
 
 
w
h
i
l
e
 
(
n
-
-
)
 
{
 
 
 
 
 
 
 
 
p
r
o
d
 
*
=
 
n
;
 
 
 
 
 
 
 
 
s
u
m
 
+
=
 
n
;
 
 
 
 
 
 
}
 
 
 
 
 
 
p
r
i
n
t
f
(
%
d
\
n
,
 
s
u
m
)
;
 
 
 
 
 
 
p
r
i
n
t
f
(
%
d
\
n
,
 
p
r
o
d
)
;
 
 
 
 
}
Slicing
criterion
[Background]
Slicing – Applications
 
3
Component
Extraction
Debugging
Program
Integration
Testing
Program
Specialization
Program
Understanding
[Background]
McSlice in a nutshell
 
McSlice: tool to slice machine code more accurately
State-of-the-art tools: Granularity issue
Instruction level is too coarse grained
Imprecise slices
McSlice slices at microcode level
Program reconstitution: machine-code synthesis
Tested on binaries of 8 FreeBSD utilities
Reduction in slice sizes
33% for backward slices
70% for forward slices
4
Software is everywhere
Everyday systems: phones, laptops, watches, cars, etc.
Critical systems: aircrafts, space shuttles, medical devices, etc.
Software is Pervasive
5
Binaries are
Binaries are
No source
code
No
documentation
[Motivation]
Machine-Code Slicer
6
[Motivation]
State-of-the-art
Machine-Code Slicers
7
[Motivation]
We need a machine-code
slicer that exhibits a high
level of precision
Outline
 
Motivation
Background
Problem with state-of-the-art tools: Granularity issue
McSlice
Experiments
Conclusion
8
Program-Dependence Graph
(PDG)
 
 
 
 
 
 
 
 
System-dependence graph (SDG)
    for multi-procedure programs
[Background]
    void sum_prod(int n) {
      int sum = 0;
      int prod = 1;
      while (n--) {
        prod *= n;
        sum += n;
      }
      printf(“%d\n”, sum);
      printf(“%d\n”, prod);
    }
sum_prod
n_formal_in
sum = 0
prod = 1
while(n--)
sum += n
prod *= n
printf(“%d\n”, sum)
printf(“%d\n”, prod)
 
Control dependence
 
Data dependence
9
Slicing
 
Intraprocedural slicing 
    reachability in PDG
Context-sensitive
interprocedural slicing 
    context-free reachability in SDG
10
[More background]
sum_prod
n_formal_in
sum = 0
prod = 1
while(n--)
sum += n
prod *= n
printf(“%d\n”, sum)
printf(“%d\n”, prod)
Control dependence
Data dependence
IA-32 Primer
 
11
  Registers
1000
 
ESP
 
Stack
20
 
EAX
20
20
 
EBX
40
996
1000
[Even more background]
QFBV Formulas
Explicit representation of
semantics of an instruction
(microcode)
12
[I swear, this is the final 
slide on background … ]
Semantics of
instructions in ISA
Machine-Code Synthesis
 
Synthesize machine-code
instructions from a semantic
specification
QFBV formula
13
[I lied 
]
 
* V. Srinivasan and T. Reps. Synthesis of machine code from semantics. In PLDI, 2015
Outline
 
Motivation
Background
Problem with state-of-the-art tools: Granularity issue
McSlice
Experiments
Conclusion
14
Machine-Code Slicing
State-of-the-art Tools
 
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
m
a
i
n
:
p
u
s
h
 
e
b
p
m
o
v
 
e
b
p
,
e
s
p
s
u
b
 
e
s
p
,
0
x
2
0
m
o
v
 
[
e
b
p
 
 
0
x
1
4
]
,
0
x
A
m
o
v
 
[
e
b
p
 
 
0
x
1
0
]
,
0
x
1
4
p
u
s
h
 
[
e
b
p
 
 
0
x
1
0
]
p
u
s
h
 
[
e
b
p
 
 
0
x
1
4
]
c
a
l
l
 
a
d
d
a
d
d
 
e
s
p
,
8
m
o
v
 
[
e
b
p
 
 
0
x
C
]
,
e
a
x
p
u
s
h
 
[
e
b
p
 
 
0
x
C
]
c
a
l
l
 
s
q
u
a
r
e
a
d
d
 
e
s
p
,
4
m
o
v
 
[
e
b
p
 
 
0
x
8
]
,
e
a
x
m
o
v
 
e
a
x
,
 
[
e
b
p
 
 
0
x
1
0
]
m
o
v
 
e
d
x
,
 
[
e
b
p
 
 
0
x
1
4
]
s
u
b
 
e
a
x
,
e
d
x
l
e
a
v
e
r
e
t
 
int add(int a, int b) {
    int sum = a + b;
    return sum;
}
 
int square(int a) {
    int product = a * a;
    return product;
}
 
int main() {
    int a = 10;
    int b = 20;
    int c = add(a, b);
    int d = square(c);
    return a - b;
}
 
add
:
push ebp
mov ebp, esp
sub esp, 10
mov eax, [ebp + 0xC]
mov edx, [ebp + 0x8]
lea eax, [eax + edx]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
 
square
:
push ebp
mov ebp, esp
sub esp, 10
mov eax, [ebp + 0x8]
imul eax, [ebp + 0x8]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
Note that body
of ‘add’ is not in
the slice
Why is the
entire body
of ‘add’ in
the slice?
[Granularity issue]
Granularity Issue
 
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
m
a
i
n
:
p
u
s
h
 
e
b
p
m
o
v
 
e
b
p
,
e
s
p
s
u
b
 
e
s
p
,
0
x
2
0
m
o
v
 
[
e
b
p
 
 
0
x
1
4
]
,
0
x
A
m
o
v
 
[
e
b
p
 
 
0
x
1
0
]
,
0
x
1
4
p
u
s
h
 
[
e
b
p
 
 
0
x
1
0
]
p
u
s
h
 
[
e
b
p
 
 
0
x
1
4
]
c
a
l
l
 
a
d
d
a
d
d
 
e
s
p
,
8
m
o
v
 
[
e
b
p
 
 
0
x
C
]
,
e
a
x
p
u
s
h
 
[
e
b
p
 
 
0
x
C
]
c
a
l
l
 
s
q
u
a
r
e
a
d
d
 
e
s
p
,
4
m
o
v
 
[
e
b
p
 
 
0
x
8
]
,
e
a
x
m
o
v
 
e
a
x
,
 
[
e
b
p
 
 
0
x
1
0
]
m
o
v
 
e
d
x
,
 
[
e
b
p
 
 
0
x
1
4
]
s
u
b
 
e
a
x
,
e
d
x
l
e
a
v
e
r
e
t
 
add
:
push ebp
mov ebp, esp
sub esp, 10
mov eax, [ebp + 0xC]
mov edx, [ebp + 0x8]
lea eax, [eax + edx]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
 
square
:
push ebp
mov ebp, esp
sub esp, 10
mov eax, [ebp + 0x8]
imul eax, [ebp + 0x8]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
Granularity Issue
ESP’ 
= ESP – 4 
Mem’ = Mem[ESP – 4 ↦ Mem(EBP – 12)]
[Granularity issue]
Granularity Issue
 
17
Machine code
Instruction-level is too
coarse-grained for
machine-code slicing!
[Granularity issue]
Outline
 
Motivation
Background
Problem with state-of-the-art tools: Granularity issue
McSlice
Experiments
Conclusion
18
Improved Slicing with McSlice
 
19
Machine code
Microcode
[Algorithm]
µ-PDG
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
m
a
i
n
:
1
9
:
 
p
u
s
h
 
e
b
p
2
0
:
 
m
o
v
 
e
b
p
,
e
s
p
2
1
:
 
s
u
b
 
e
s
p
,
0
x
2
0
2
2
:
 
m
o
v
 
[
e
b
p
 
 
0
x
1
4
]
,
0
x
A
2
3
:
 
m
o
v
 
[
e
b
p
 
 
0
x
1
0
]
,
0
x
1
4
2
4
:
 
p
u
s
h
 
[
e
b
p
 
 
0
x
1
0
]
2
5
:
 
p
u
s
h
 
[
e
b
p
 
 
0
x
1
4
]
2
6
:
 
c
a
l
l
 
a
d
d
2
7
:
 
a
d
d
 
e
s
p
,
8
2
8
:
 
m
o
v
 
[
e
b
p
 
 
0
x
C
]
,
e
a
x
2
9
:
 
p
u
s
h
 
[
e
b
p
 
 
0
x
C
]
3
0
:
 
c
a
l
l
 
s
q
u
a
r
e
3
1
:
 
a
d
d
 
e
s
p
,
4
3
2
:
 
m
o
v
 
[
e
b
p
 
 
0
x
8
]
,
e
a
x
3
3
:
 
m
o
v
 
e
a
x
,
 
[
e
b
p
 
 
0
x
1
0
]
3
4
:
 
m
o
v
 
e
d
x
,
 
[
e
b
p
 
 
0
x
1
4
]
3
5
:
 
s
u
b
 
e
a
x
,
e
d
x
3
6
:
 
l
e
a
v
e
3
7
:
 
r
e
t
 
add
:
1: push ebp
2: mov ebp, esp
3: sub esp, 4
4: mov eax, [ebp + 0xC]
5: add eax, [ebp + 0x8]
6: mov [ebp – 0x4], eax
7: mov eax, [ebp – 0x4]
8: leave
9: ret
 
square
:
10: push ebp
11: mov ebp, esp
12: sub esp, 4
13: mov eax, [ebp + 0x8]
14: imul eax, [ebp + 0x8]
15: mov [ebp – 0x4], eax
16: mov eax, [ebp – 0x4]
17: leave
18: ret
[Algorithm]
CodeSurfer/x86 Vs. McSlice
 
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
m
a
i
n
p
u
s
h
 
e
b
p
m
o
v
 
e
b
p
,
e
s
p
s
u
b
 
e
s
p
,
0
x
2
0
m
o
v
 
[
e
b
p
 
 
0
x
1
4
]
,
0
x
A
m
o
v
 
[
e
b
p
 
 
0
x
1
0
]
,
0
x
1
4
p
u
s
h
 
[
e
b
p
 
 
0
x
1
0
]
p
u
s
h
 
[
e
b
p
 
 
0
x
1
4
]
c
a
l
l
 
a
d
d
a
d
d
 
e
s
p
,
8
m
o
v
 
[
e
b
p
 
 
0
x
C
]
,
e
a
x
p
u
s
h
 
[
e
b
p
 
 
0
x
C
]
c
a
l
l
 
s
q
u
a
r
e
a
d
d
 
e
s
p
,
4
m
o
v
 
[
e
b
p
 
 
0
x
8
]
,
e
a
x
m
o
v
 
e
a
x
,
 
[
e
b
p
 
 
0
x
1
0
]
m
o
v
 
e
d
x
,
 
[
e
b
p
 
 
0
x
1
4
]
s
u
b
 
e
a
x
,
e
d
x
l
e
a
v
e
r
e
t
 
add
:
push ebp
mov ebp, esp
sub esp, 4
mov eax, [ebp + 0xC]
add eax, [ebp + 0x8]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
 
square
:
push ebp
mov ebp, esp
sub esp, 4
mov eax, [ebp + 0x8]
imul eax, [ebp + 0x8]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
m
a
i
n
p
u
s
h
 
e
b
p
*
m
o
v
 
e
b
p
,
e
s
p
s
u
b
 
e
s
p
,
0
x
2
0
*
m
o
v
 
[
e
b
p
 
 
0
x
1
4
]
,
0
x
A
m
o
v
 
[
e
b
p
 
 
0
x
1
0
]
,
0
x
1
4
p
u
s
h
 
[
e
b
p
 
 
0
x
1
0
]
*
p
u
s
h
 
[
e
b
p
 
 
0
x
1
4
]
*
c
a
l
l
 
a
d
d
a
d
d
 
e
s
p
,
8
m
o
v
 
[
e
b
p
 
 
0
x
C
]
,
e
a
x
p
u
s
h
 
[
e
b
p
 
 
0
x
C
]
*
c
a
l
l
 
s
q
u
a
r
e
a
d
d
 
e
s
p
,
4
m
o
v
 
[
e
b
p
 
 
0
x
8
]
,
e
a
x
m
o
v
 
e
a
x
,
 
[
e
b
p
 
 
0
x
1
0
]
m
o
v
 
e
d
x
,
 
[
e
b
p
 
 
0
x
1
4
]
s
u
b
 
e
a
x
,
e
d
x
*
l
e
a
v
e
r
e
t
 
add
:
push ebp
mov ebp, esp
sub esp, 4
mov eax, [ebp + 0xC]
add eax, [ebp + 0x8]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave
ret
 
square
:
push ebp
mov ebp, esp
sub esp, 4
mov eax, [ebp + 0x8]
imul eax, [ebp + 0x8]
mov [ebp – 0x4], eax
mov eax, [ebp – 0x4]
leave*
ret
[Algorithm]
Program Reconstitution
 
22
Machine code
Microcode
Fix parameter
mismatches at
call-sites
Include microcode
for ARs
[Algorithm]
Outline
 
Motivation
Background
Problem with state-of-the-art tools: Granularity issue
McSlice
Experiments
Conclusion
23
Test Suite
 
Binaries of 8 FreeBSD utilities
295 – 2207 LOC
Backward-slicing criterion – actuals to print procedures
Forward-slicing criterion – variables initialized at program start
CodeSurfer/x86 Vs. McSlice
CodeSurfer/x86: No. of instructions in slice
McSlice: No. of instructions whose microcode is included in slice
24
[Experiments]
Slice Sizes
25
[Experiments]
Partial Instructions
26
[Experiments]
Component Extraction
 
Backward microcode slice
Include extra microcode for executable program
McSynth: microcode to machine code
Executable machine-code program
27
[Experiments]
Conclusion
 
Algorithm to slice machine code more accurately
McSlice – slicer for IA-32
Granularity issue: McSlice slices at microcode level
Program-reconstitution issue: machine-code synthesis
McSlice reduces slice sizes by 33% for backward slices, and
70% for forward slices
Extract executable components from binaries
28
Microcode
Questions?
 
29
Machine code
Microcode
Slide Note
Embed
Share

Algorithm presented for slicing machine code with applications in debugging, program integration, and more. McSlice tool enhances accuracy by slicing at microcode level, reducing slice sizes significantly. Discussion on the pervasiveness of software and the importance of understanding binaries.

  • Algorithm
  • Machine Code
  • Slicing
  • Debugging
  • Software

Uploaded on Feb 15, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. An Improved Algorithm for Slicing Machine Code Venkatesh Srinivasan Thomas Reps {venk, reps}@cs.wisc.edu 1

  2. Slicing [Background] Computes the set of program points that might affect/be affected by slicing criterion Backward slicing Forward slicing void sum_prod( int n ) { int sum = 0; int prod = 1; while (n--) { prod *= n; sum += n; } printf( %d\n , sum); printf( %d\n , prod); } void sum_prod( int n ) { int sum = 0; int prod = 1; while (n--) { prod *= n; sum += n; } printf( %d\n , sum); printf( %d\n , prod); } Slicing criterion Slicing criterion 2

  3. Slicing Applications [Background] Debugging Program Integration Component Extraction Slicing Program Understanding Testing Program Specialization 3

  4. McSlice in a nutshell McSlice: tool to slice machine code more accurately State-of-the-art tools: Granularity issue Instruction level is too coarse grained Imprecise slices McSlice slices at microcode level Program reconstitution: machine-code synthesis Tested on binaries of 8 FreeBSD utilities Reduction in slice sizes 33% for backward slices 70% for forward slices 4

  5. Software is Pervasive Binaries are [Motivation] Software is everywhere Everyday systems: phones, laptops, watches, cars, etc. Critical systems: aircrafts, space shuttles, medical devices, etc. Binaries are 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 11111101010 No source No source code code No No documentation documentation What does this binary do? Can this binary crash? Does this binary satisfy a given property? Does this binary have a security vulnerability? 5

  6. Machine-Code Slicer [Motivation] 0100010 Binary 1011101 Partial Evaluation (binding-time analysis BTA) 01000101011 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 11111101010 Taint tracking (e.g., prune away parts of binary irrelevant to taint sources) 1101010 Binary 0010111 Machine-code slicer Component Extraction (e.g., line count from wc wc binary) 0111101 Binary 0010111 6

  7. State-of-the-art Machine-Code Slicers [Motivation] There are state-of-the-art tools that can slice machine code (e.g., CodeSurfer/x86) We need a machine We need a machine- -code slicer that exhibits a high slicer that exhibits a high level of precision level of precision code But, slices are extremely imprecise Slices often include entire binary 7

  8. Outline Motivation Background Problem with state-of-the-art tools: Granularity issue McSlice Experiments Conclusion 8

  9. Program-Dependence Graph (PDG) [Background] Control dependence Data dependence void sum_prod(int n) { int sum = 0; int prod = 1; while (n--) { prod *= n; sum += n; } printf( %d\n , sum); printf( %d\n , prod); } sum_prod n_formal_in prod = 1 sum = 0 while(n--) prod *= n sum += n System-dependence graph (SDG) for multi-procedure programs printf( %d\n , sum) printf( %d\n , prod) 9

  10. Slicing [More background] Intraprocedural slicing reachability in PDG Context-sensitive interprocedural slicing context-free reachability in SDG Control dependence Data dependence sum_prod n_formal_in prod = 1 sum = 0 while(n--) prod *= n sum += n printf( %d\n , sum) printf( %d\n , prod) 10

  11. IA-32 Primer [Even more background] Registers 20 EBX 1000 ESP 20 EAX 996 1000 20 40 Stack push eax mov ebx, eax mov [esp], 40 lea esp, [esp+4] 11

  12. [I swear, this is the final slide on background ] QFBV Formulas Explicit representation of semantics of an instruction (microcode) ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] EBP = EBP EAX = EAX EBX = EBX ... CF = CF OF = OF instructions in ISA Semantics of Symbolic Execution QFBV Formula Instruction 12

  13. Machine-Code Synthesis [I lied ] Synthesize machine-code instructions from a semantic specification QFBV formula ESP = ESP 4 Mem = Mem[ESP 4 EAX] McSynth* lea esp, [esp 4] push eax mov [esp], eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] push eax lea esp, [esp 4];mov [esp], eax ESP = ESP 4 Mem = Mem[ESP 4 EAX] * V. Srinivasan and T. Reps. Synthesis of machine code from semantics. In PLDI, 2015 13

  14. Outline Motivation Background Problem with state-of-the-art tools: Granularity issue McSlice Experiments Conclusion 14

  15. Machine-Code Slicing State-of-the-art Tools [Granularity issue] int add(int a, int b) { int sum = a + b; return sum; } Note that body of add is not in the slice add: push ebp mov ebp, esp sub esp, 10 mov eax, [ebp + 0xC] mov edx, [ebp + 0x8] lea eax, [eax + edx] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret main: push ebp mov ebp,esp sub esp,0x20 mov [ebp 0x14],0xA mov [ebp 0x10],0x14 push [ebp 0x10] push [ebp 0x14] call add add esp,8 mov [ebp 0xC],eax push [ebp 0xC] call square add esp,4 mov [ebp 0x8],eax mov eax, [ebp 0x10] mov edx, [ebp 0x14] sub eax,edx leave ret Why is the entire body of add in the slice? int square(int a) { int product = a * a; return product; } int main() { int a = 10; int b = 20; int c = add(a, b); int d = square(c); return a - b; } square: push ebp mov ebp, esp sub esp, 10 mov eax, [ebp + 0x8] imul eax, [ebp + 0x8] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret 15

  16. Granularity Issue [Granularity issue] add: push ebp mov ebp, esp sub esp, 10 mov eax, [ebp + 0xC] mov edx, [ebp + 0x8] lea eax, [eax + edx] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret main: push ebp mov ebp,esp sub esp,0x20 mov [ebp 0x14],0xA mov [ebp 0x10],0x14 push [ebp 0x10] push [ebp 0x14] call add add esp,8 mov [ebp 0xC],eax push [ebp 0xC] call square add esp,4 mov [ebp 0x8],eax mov eax, [ebp 0x10] mov edx, [ebp 0x14] sub eax,edx leave ret Granularity Issue Granularity Issue ESP = ESP 4 Mem = Mem[ESP 4 Mem(EBP 12)] square: push ebp mov ebp, esp sub esp, 10 mov eax, [ebp + 0x8] imul eax, [ebp + 0x8] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret 16

  17. Granularity Issue [Granularity issue] 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 Imprecise slice Instruction- level PDG Slicing Algorithm CodeSurfer/x86 Machine code Slicing criterion Instruction Instruction- -level is too coarse coarse- -grained for grained for machine machine- -code slicing! code slicing! level is too 17

  18. Outline Motivation Background Problem with state-of-the-art tools: Granularity issue McSlice Experiments Conclusion 18

  19. Improved Slicing with McSlice [Algorithm] 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 Instruction- level PDG CodeSurfer/x86 Machine code Microcode Precise microcode slice Instruction- level PDG to -PDG Slicing Algorithm -PDG Slicing criterion 19

  20. -PDG [Algorithm] add: 1: push ebp 2: mov ebp, esp 3: sub esp, 4 4: mov eax, [ebp + 0xC] 5: add eax, [ebp + 0x8] 6: mov [ebp 0x4], eax 7: mov eax, [ebp 0x4] 8: leave 9: ret main: 19: push ebp 20: mov ebp,esp 21: sub esp,0x20 22: mov [ebp 0x14],0xA 23: mov [ebp 0x10],0x14 24: push [ebp 0x10] 25: push [ebp 0x14] 26: call add 27: add esp,8 28: mov [ebp 0xC],eax 29: push [ebp 0xC] 30: call square 31: add esp,4 32: mov [ebp 0x8],eax 33: mov eax, [ebp 0x10] 34: mov edx, [ebp 0x14] 35: sub eax,edx 36: leave 37: ret square: 10: push ebp 11: mov ebp, esp 12: sub esp, 4 13: mov eax, [ebp + 0x8] 14: imul eax, [ebp + 0x8] 15: mov [ebp 0x4], eax 16: mov eax, [ebp 0x4] 17: leave 18: ret 20

  21. CodeSurfer/x86 Vs. McSlice [Algorithm] add: push ebp mov ebp, esp sub esp, 4 mov eax, [ebp + 0xC] add eax, [ebp + 0x8] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret main push ebp mov ebp,esp sub esp,0x20 mov [ebp 0x14],0xA mov [ebp 0x10],0x14 push [ebp 0x10] push [ebp 0x14] call add add esp,8 mov [ebp 0xC],eax push [ebp 0xC] call square add esp,4 mov [ebp 0x8],eax mov eax, [ebp 0x10] mov edx, [ebp 0x14] sub eax,edx leave ret add: push ebp mov ebp, esp sub esp, 4 mov eax, [ebp + 0xC] add eax, [ebp + 0x8] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret main push ebp* mov ebp,esp sub esp,0x20* mov [ebp 0x14],0xA mov [ebp 0x10],0x14 push [ebp 0x10]* push [ebp 0x14]* call add add esp,8 mov [ebp 0xC],eax push [ebp 0xC]* call square add esp,4 mov [ebp 0x8],eax mov eax, [ebp 0x10] mov edx, [ebp 0x14] sub eax,edx* leave ret square: push ebp mov ebp, esp sub esp, 4 mov eax, [ebp + 0x8] imul eax, [ebp + 0x8] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave ret square: push ebp mov ebp, esp sub esp, 4 mov eax, [ebp + 0x8] imul eax, [ebp + 0x8] mov [ebp 0x4], eax mov eax, [ebp 0x4] leave* ret 21

  22. Program Reconstitution [Algorithm] 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 Precise Machine- code program Fix parameter mismatches at call-sites Include microcode for ARs Instruction- level PDG CodeSurfer/x86 Machine code Microcode Precise microcode slice Instruction- level PDG to -PDG Slicing Algorithm McSynth -PDG Slicing criterion 22

  23. Outline Motivation Background Problem with state-of-the-art tools: Granularity issue McSlice Experiments Conclusion 23

  24. Test Suite [Experiments] Binaries of 8 FreeBSD utilities 295 2207 LOC Backward-slicing criterion actuals to print procedures Forward-slicing criterion variables initialized at program start CodeSurfer/x86 Vs. McSlice CodeSurfer/x86: No. of instructions in slice McSlice: No. of instructions whose microcode is included in slice 24

  25. Slice Sizes [Experiments] Backward Slice: 33% Forward slice: 70% Slice-size comparison No. of instructions in slice 1000 100 10 CS/x86 McSlice 1 Forward Backward <Application, slicing direction> 25

  26. Partial Instructions [Experiments] Backward Slice: 40% Forward slice: 16% Split of partial and entire instructions Opcode variant Count - bwd Opcode variant Count - fwd 100% ADD 564 PUSH 45 80% PUSH 410 TEST 3 60% SUB 445 AND 2 40% CMP 280 20% TEST 265 0% AND 39 SHL 32 SAR 18 IMUL 14 Backward Forward No. of entire instructions in slice No. of partial instructions in slice IDIV 13 26

  27. Component Extraction [Experiments] Backward microcode slice Include extra microcode for executable program McSynth: microcode to machine code Executable machine-code program Application Extracted component No. of instructions in binary No of instructions in extracted component wc-lite Line count 295 64 wc Word count 790 242 cksum Length of input 815 338 27

  28. Conclusion Algorithm to slice machine code more accurately McSlice slicer for IA-32 Granularity issue: McSlice slices at microcode level Program-reconstitution issue: machine-code synthesis McSlice reduces slice sizes by 33% for backward slices, and 70% for forward slices Extract executable components from binaries 28

  29. Questions? 11110101010 10010001010 11111101010 Binary 00101010101 11010101010 10010001010 Precise Machine- code program Instruction- level PDG CodeSurfer/x86 Machine code Microcode Precise microcode slice Instruction- level PDG to -PDG Slicing Algorithm McSynth -PDG Slicing criterion 29

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#