Transparent and Efficient CFI Enforcement with Intel Processor Trace

Transparent and Efficient

CFI

Enforcement

with

Intel Processor Trace

(IPT)

Yutao Liu

Peitao

Shi,

Xinran

Wang,

Haibo Chen

, Binyu Zang, Haibing Guan

Institute of Parallel and Distributed System (IPADS)

Shanghai Jiao Tong University

http://ipads.se.sjtu.edu.cn

Control

Flow

Integrity

Control

Flow

Hijacking Attacks

Memory

corruption

bugs

Victim

memory

Overwrite

Attackers

Attacks

Defenses

Program’s

Control

Flow



Program

consists

of

basic

block

(BB)



Control

flow

from

one

BB

to

other

BB



Each

BB

has

limited

valid

targets



CFG

can

be

pre-generated

Control

Flow

Graph

(CFG)

Attack

issues

an

invalid

control

flow

transfer

Control

Flow

Integrity

(CFI):

Enforce

Enforce

control

control

flow

flow

as

as

CFG

CFG

during

during

runtime

runtime

Runtime

CFI Enforcement

Method

#1:

instrumented

checking

Method

#2:

transparent

monitoring

jmp

%ecx

…

call

%ebx

…

ret

check(%ecx)

jmp

%ecx

…

check(%ebx)

call

%ebx

…

check(stack)

ret

Compiler

based

Binary

rewriting

Check

all

branches

Compare

Transparent

Monitoring

How

to

TRACE?

When

to

TRIGGER?

What

to

CHECK?

How

to

Trace

•

Trace

Trace

by

by

hardware

hardware

(performance

(performance

counter)

counter)

–

Two

choices

before:

BTS

LBR

•

Branch

Branch

trace

trace

store

store

(BTS)

(BTS)

–

Trace

every

branch

in

memory

source

destination,

type

–

Sufficient

information,

but

extraordinarily

slow

•

Last

Last

branch

branch

record

record

(LBR)

(LBR)

–

Only

trace

most

recent

(16

or

32)

branches

in

register

–

Very

fast,

but

with

insufficient

information

–

History

flush

attacks

When

to

trigger

•

When

When

the

the

trace buffer

trace buffer

is

is

full

full

–

Buffer

size

matters

–

Trade-off

between

performance

and

security

•

When

When

specific

specific

events

events

happen

happen

–

Whenever

attack

may

happen

–

Cross-boundary

points

What

to

check

•

Heuristic

Heuristic

checking

checking

–

Ensure

that

control

flow

obeys

some

simple

rules:

•

Call

to

function

entry

•

Return

to

instruction

right

after

call

•

Etc.

•

Strict

Strict

CFG

CFG

enforcement

enforcement

–

Pre-generate

CFG

and

enforce

it

at

runtime

–

Fine-

or

coarse-grained

CFG

•

Shadow

stack?

In

Summary…

•

Efficient

Efficient

trace

trace

with

with

sufficient

sufficient

runtime

runtime

information

information

–

BTS

and

LBR

cannot

survive

•

Appropriate

Appropriate

triggering

triggering

point

point

–

Prevent

attacks

without

sacrificing

too

much

performance

•

Fine-grained

Fine-grained

CFG

CFG

enforcement

enforcement

–

Heuristic

check

is

not

enough

•

Intel

Processor

Trace

(IPT)

–

Introduced

in

Intel

Broadwell

–

Fast

tracing

–

Can

trace

sufficient

information

in

memory

IPT

to

the

Rescue

BUT

WHY>>>

•

IPT uses aggressive compression

–

Unconditional direct branches are not logged at all

–

Conditional branches are compressed to a single bit

–

Each

indirect branch

is

traced

as

one

target

address

–

Result

in

average <1 bit per retired instruction

Background:

Demystify

IPT

Fast

Trace

•

TIP

packet

target

address

of

indirect

branch

•

TNT

packet

indication

of

taken

or

non-taken

conditional

branch

Background: IPT

Trace

Example

•

The performance overhead is shifted from tracing

to decoding

–

Decoding is several orders of magnitude slower than

tracing.

Challenges: Fast

Trace

vs.

Slow

Decode

Contribution

•

FlowGuard:

FlowGuard:

practical

practical

CFI

CFI

with

with

IPT

IPT

–

Transparent

monitor

without

instrumentation

–

Efficient

trace

and

check

by

separating

fast and slow

paths

–

Precise

CFI

enforcement

with

fine-grained

CFG

and

runtime

information

•

Evaluation results

Evaluation results

–

Apply

FlowGuard

to

server

applications

–

Prevent

various

of

code

reuse

attacks

–

Less

than

4%

performance

overhead

for

normal

use

cases

Outline

•

Efficient

trace

and

check

•

Precise

CFI

enforcement

•

Implementation and evaluation

•

Main

problem:

inconsistency

between

static

generated

CFG

and

IPT

traced

data

Why

Slow

Decode

is

Required?

Traditional

static

generated

CFG

IPT

traced

data

•

Indirect

Indirect

targets

targets

connected

connected

CFG

CFG

(ITC-CFG)

(ITC-CFG)

–

Nodes

Nodes

left:

left:

BB

BB

with

with

incoming

incoming

indirect

indirect

edges

edges

(IT-BB)

(IT-BB)

–

Edges

Edges

reconnection:

reconnection:

two

two

IT-BBs

IT-BBs

are

are

connected

connected

if

if

and

and

only

only

if:

if:

•

There

There

is

is

only

only

one

one

indirect

indirect

edge

edge

in

in

the

the

path

path

from

from

BB-x

BB-x

to

to

BB-y

BB-y

•

This

This

indirect

indirect

edge

edge

is

is

targeted

targeted

at

at

BB-y

BB-y

Solution:

IPT

Compatible

CFG

Construction

Static

analysis

Fast

Path

Check

in

Runtime

IPT

traced

data

ITC-CFG

•

IPT

IPT

traced

traced

data

data

can

can

be

be

directly

directly

matched

matched

on

on

the

the

ITC-CFG

ITC-CFG

……

Outline

•

Efficient

trace

and

check

•

Precise

CFI

enforcement

•

Implementation and evaluation

•

Coarse-grained CFI

Coarse-grained CFI

–

Over-approximated CFG

Over-approximated CFG

generation

generation

–

Result in large false negative

Result in large false negative

–

Not

Not

benefit

benefit

from

from

the

the

whole

whole

dynamic

dynamic

information

information

•

Precision

Precision

loss

loss

–

Average

Average

indirect

indirect

targets

targets

allowed

allowed

(AIA)

(AIA)

Fast

Path

(ITC-CFG)

Problem

Main

reason:

lack

of

TNT

information

This

can

be

solved

by

slow

decode!

Solution:

Separate

Fast

and

Slow

Path

•

Dynamic

Dynamic

training

training

to

to

label

label

ITC-CFG

ITC-CFG

edges

edges

with

with

credits

credits

–

The

The

credit

credit

of

of

each

each

edge

edge

depends

depends

on

on

its

its

occurrence

occurrence

during

during

the

the

training

training

phase

phase

–

Each edge is

Each edge is

also

also

associated

associated

with

with

the

the

TNT

TNT

information

information

Dynamic

Fuzzing

Training

ITC-CFG

Credit

Labeled

ITC-CFG

NULL

NULL

•

Dynamic

Dynamic

training

training

to

to

label

label

ITC-CFG

ITC-CFG

edges

edges

with

with

credits

credits

–

The

The

credit

credit

of

of

each

each

edge

edge

depends

depends

on

on

its

its

occurrence

occurrence

during

during

the

the

training

training

phase

phase

–

Each edge is

Each edge is

also

also

associated

associated

with

with

the

the

TNT

TNT

information

information

•

We

We

use

use

fuzzing

fuzzing

based

based

approach

approach

–

AFL:

AFL:

coverage-oriented

coverage-oriented

fuzzer

fuzzer

•

Note: the

Note: the

security

security

of

of

FlowGuard

FlowGuard

does

does

not

not

rely

rely

on

on

the

the

coverage

coverage

Dynamic

Fuzzing

Training

(cont’)

•

We

We

have

have

default

default

setting

setting

with

with

security-sensitive

security-sensitive

system

system

calls

calls

–

read,

read,

write,

write,

execve,

execve,

mmap,

mmap,

mprotect,

mprotect,

sigaction,

sigaction,

sigreturn

sigreturn

•

Provide

Provide

users

users

with

with

interface

interface

to

to

specify

specify

their

their

own

own

endpoints

endpoints

System

Call

Interception

Outline

•

Efficient

trace

and

check

•

Precise

CFI

enforcement

•

Implementation and evaluation

FlowGuard Architecture

Experimental

Setup

•

Intel

Intel

Skylake

Skylake

machine

machine

with

with

IPT

IPT

support

support

–

cores

16GB

RAM

–

Debian

8.0,

Linux

kernel

4.3.0

•

Dyninst

Dyninst

plugin

plugin

for

for

static

static

binary

binary

analysis

analysis

•

AFL

AFL

for

for

fuzzing

fuzzing

the

the

software

software

and

and

collecting

collecting

training

training

inputs

inputs

–

desock to channel socket communication to the console

Security

Analysis

•

Attack

Attack

detected

detected

–

ROP:

ROP:

during

during

write()

write()

syscall

syscall

–

SROP:

SROP:

during

during

sigreturn() syscall

sigreturn() syscall

•

Average Indirect-targets Allowed (AIA)

Average Indirect-targets Allowed (AIA)

summary

summary

•

Macro

Macro

Benchmarks

Benchmarks

Performance

Evaluation

Summary

•

FlowGuard:

FlowGuard:

leverage

leverage

IPT

IPT

for

for

practical

practical

CFI

CFI

–

Transparent

monitor

without

instrumentation

–

Efficient

trace

and

check

by

separating

fast and slow paths

–

Precise

CFI

enforcement

with

fine-grained

CFG

and

runtime

information

•

working prototype on Intel

working prototype on Intel

Skylake

Skylake

with

with

promising

promising

result

result

–

Successfully

detect

ROP

like

attacks

and

optimize

AIA

–

Small

Performance

impact

Questions

http://ipads.se.sjtu.edu.cn

Institute of Parallel And

Distributed Systems

Thanks

Slide Note

Embed Share

Download

This research discusses Control Flow Integrity (CFI) enforcement to combat control flow hijacking attacks. It explores methods for runtime CFI enforcement, including instrumented checking and transparent monitoring. The study delves into trace mechanisms, buffer management strategies, and when to trigger trace events in order to enhance program security.

rachanas Follow

Uploaded on Sep 28, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Institute of Parallel and Distributed Systems IPADS Control Flow Integrity Transparent and Efficient CFI Enforcement with Intel Processor Trace (IPT) Yutao Liu, Peitao Shi, Xinran Wang, Haibo Chen, Binyu Zang, Haibing Guan Institute of Parallel and Distributed System (IPADS) Shanghai Jiao Tong University http://ipads.se.sjtu.edu.cn

Control Flow Hijacking Attacks Shellcode execution Memory corruption bugs BAD THING Overwrite Code reuse Attackers Victim memory

Attacks & Defenses Control flow hijacking attacks Randomization Enforcement

Programs Control Flow Program consists of basic block (BB) Control flow from one BB to other BB Each BB has limited valid targets CFG can be pre-generated BB-1 BB-3 BB-2 Attack: issues an invalid control flow transfer BB-5 BB-4 BB-6 BB-7 BB-8 BB-9 BB-10 BB-11 Control Flow Graph (CFG) Control Flow Integrity (CFI): Enforce control flow as CFG during runtime

Runtime CFI Enforcement Method #1: instrumented checking Check all branches check(%ecx) jmp *%ecx check(%ebx) call *%ebx check(stack) ret Compiler based jmp *%ecx call *%ebx ret Break code integrity COTS unfriendly Binary rewriting Share library unfriendly Method #2: transparent monitoring RUNTIME Binary Compare Transparent to applications Analysis Binary

Transparent Monitoring How to TRACE? When to TRIGGER? What to CHECK?

How to Trace Trace by hardware (performance counter) Two choices before: BTS & LBR Branch trace store (BTS) Trace every branch in memory:source,destination, type Sufficient information, but extraordinarily slow Last branch record (LBR) Only trace most recent (16 or 32) branches in register Very fast, but with insufficient information History flush attacks

When to trigger When the trace buffer is full Buffer size matters Trade-off between performance and security When specific events happen Whenever attack may happen Cross-boundary points Security sensitive system call

What to check Heuristic checking Ensure that control flow obeys some simple rules: Call to a function entry Return to instruction right after call Etc. Strict CFG enforcement Pre-generate CFG,and enforce it at runtime Fine- or coarse-grained CFG Shadow stack? Fine-grained CFG enforcement

In a Summary Efficient trace with sufficient runtime information BTS and LBR cannot survive Appropriate triggering point Prevent attacks without sacrificing too much performance Fine-grained CFG enforcement Heuristic check is not enough

IPT to the Rescue Intel Processor Trace (IPT) Introduced in Intel Broadwell Fast tracing Can trace sufficient information in memory BUT WHY>>>

Background: Demystify IPT Fast Trace IPT uses aggressive compression Unconditional direct branches are not logged at all Conditional branches are compressed to a single bit Each indirect branch is traced as one target address Result in average <1 bit per retired instruction

Background: IPT Trace Example TIP packet: target address of indirect branch TNT packet: indication of taken or non-taken conditional branch

Challenges: Fast Trace vs. Slow Decode The performance overhead is shifted from tracing to decoding Decoding is several orders of magnitude slower than tracing. Precise Tracing Decoding Filtering BTS Full Slow (50X) Fast None LBR Low Very Fast (< 1%) Fast CPL, CoFI IPT Full Fast (3%) Slow (200X) CPL, CR3, IP

Contribution FlowGuard: practical CFI with IPT Transparent monitor without instrumentation Efficient trace and check by separating fast and slow paths Precise CFI enforcement with fine-grained CFG and runtime information Evaluation results Apply FlowGuard to server applications Prevent a various of code reuse attacks Less than 4% performance overhead for normal use cases

Outline Efficient trace and check Precise CFI enforcement Implementation and evaluation

Why Slow Decode is Required? Main problem: inconsistency between static generated CFG and IPT traced data Indirect edge BB-1 Direct edge Conditional branch information BB-3 BB-3 BB-2 BB-2 T T N BB-5 BB-5 BB-4 BB-6 N BB-7 BB-9 BB-10 BB-7 BB-8 BB-9 BB-10 Traditional static generated CFG IPT traced data

Solution: IPT Compatible CFG Construction Indirect targets connected CFG (ITC-CFG) Nodes left: BB with incoming indirect edges (IT-BB) Edges reconnection: two IT-BBs are connected if and only if: There is only one indirect edge in the path from BB-x to BB-y This indirect edge is targeted at BB-y BB-1 BB-3 BB-2 BB-5 BB-3 BB-2 Static T analysis N BB-5 BB-4 BB-6 BB-7 BB-9 BB-10 BB-7 BB-8 BB-9 BB-10

Fast Path Check in Runtime IPT traced data can be directly matched on the ITC-CFG TIP BB-3 TIP BB-2 BB-3 BB-2 BB-5 ? TIP BB-7 TIP BB-9 BB-7 BB-9 BB-10 IPT traced data ITC-CFG

Outline Efficient trace and check Precise CFI enforcement Implementation and evaluation

Fast Path (ITC-CFG) Problem Coarse-grained CFI Over-approximated CFG generation Result in large false negative Not benefit from the whole dynamic information Precision loss Average indirect targets allowed (AIA) This can be solved by slow decode! Main reason: lack of TNT information

Solution: Separate Fast and Slow Path IPT Edges matched? Y No traced data ITC-CFG attack N Fast path Attack detected Credible edges matched? Y IPT Credit labeled ITC-CFG Y Edges matched? No traced data attack N N Slow path How? Attack detected Slow path Pre-generated Binary

Dynamic Fuzzing Training Dynamic training to label ITC-CFG edges with credits The credit of each edge depends on its occurrence during the training phase Each edge is also associated with the TNT information BB-3 BB-3 BB-2 BB-5 BB-2 BB-5 T BB-7 BB-9 BB-10 BB-7 BB-9 BB-10 ITC-CFG Credit Labeled ITC-CFG

Dynamic Fuzzing Training (cont) Dynamic training to label ITC-CFG edges with credits The credit of each edge depends on its occurrence during the training phase Each edge is also associated with the TNT information We use a fuzzing based approach AFL: a coverage-oriented fuzzer Note: the security of FlowGuard does not rely on the coverage

System Call Interception We have a default setting with 7 security-sensitive system calls read, write, execve, mmap, mprotect, sigaction, sigreturn Provide users with interface to specify their own endpoints

Outline Efficient trace and check Precise CFI enforcement Implementation and evaluation

FlowGuard Architecture Static Binary Analysis 1 Process Executable Credit Labeled ITC-CFG Libraries Dynamic Fuzzing Training 2 4 User Kernel 5 Cores 3 Kernel Module Syscall Interceptor Flow Checker Fast Path Slow Path Memory

Experimental Setup Intel Skylake machine with IPT support 8 cores & 16GB RAM Debian 8.0, Linux kernel 4.3.0 Dyninst plugin for static binary analysis AFL for fuzzing the software and collecting training inputs desock to channel socket communication to the console

Security Analysis Attack detected ROP: during write() syscall SROP: during sigreturn() syscall Average Indirect-targets Allowed (AIA) summary

Performance Evaluation Macro Benchmarks

Institute of Parallel and Distributed Systems Summary IPADS FlowGuard: leverage IPT for practical CFI Transparent monitor without instrumentation Efficient trace and check by separating fast and slow paths Precise CFI enforcement with fine-grained CFG and runtime information A working prototype on Intel Skylake with promising result Successfully detect ROP like attacks and optimize AIA Small Performance impact

Institute of Parallel and Distributed Systems Thanks IPADS Questions Questions Institute of Parallel And Distributed Systems http://ipads.se.sjtu.edu.cn

Transparent and Efficient CFI Enforcement with Intel Processor Trace

Download Presentation

Presentation Transcript

Related

More Related Content