Framework for Developing Verified Assemblers for ELF Format

 
Towards a Framework for Developing
Verified Assemblers for the ELF Format
 
Jinhua Wu
1
, Yuting Wang
*1
, Meng Sun
1
, Xiangzhe Xu
2
, and Yichen Song
1
1  
Shanghai Jiao Tong University, Shanghai, China
2  
Purdue University, West Lafayette, USA
 
APLAS 2023
2023.11.28
 
1
Motivation
 
Verified assemblers: the last mile in the verified compilation
 
 
 
 
Significance of the complete verified compilation:
End-to-end behavior preservation
 
 
Important in building verified systems software stacks
Verified Software Toolchain
Certified Abstraction Layer in CertiKOS
...
 
CompCert
 
 
2
Motivation
 
Verified assembler
s for
Multiple architectures: X86, RISC-V, ARM, ...
Standard
 
object
 
file
 
formats: ELF, PE, ...
 
 
 
 
 
Verified assembler framework should be:
Configurable
:
 support different architectures
Extensible:
 easy to extend to a new architecture
General:
 reusable when supporting more features
 
3
Related Work: Verified Assemblers
 
CompCertELF: A prototype of verified assemblers for ELF
 
 
 
 
CakeML Backend: Labelled assembly language
 
 
 
🙁 LabLang:
Not 
a standard object
file format
No information for
symbols relocation
 
🙁 CompCertELF
assembler:
Only supports 
a small
subset of X86-32
Requires large amount of
work to implement
4
Problems and Challenges:
 
Problem 1: 
large amount of work
 
to support multiple architectures
Avoid redundant implementation and verification for each architecture
 
 
 
 
 
 
Challenge 1: how to design the verified assembler framework ?
5
Problems and Challenges:
 
Problem 2: high complexity in instruction encoding and verification
 
 
 
 
 
Impractical to manually implement and verify instruction encoding
 
Challenge 2: How to simplify the instruction encoding?
 
6
Our Contributions
 
A framework for developing verified assemblers for:
Multiple architectures: X86-32/64 and RISC-V 32/64
Standard object file format: ELF format
 
 
 
 
Key features:
1.
Abstraction
 over architecture-dependent assembly
 
 
2.
Integration of 
automated and verified
 instruction encoding
Verified Assembler
Framework
7
 
The Rest of This Talk
 
Overview of the framework
 
Key features
 
Evaluation
 
Limitations
 
8
Overview of the Framework
9
A Running Example
 
/* main.c */
int counter = 0;
void incr(){
  counter++;
}
int main(){
  
incr();
}
 
Assembly program
 
CompCert and
Target Printer
10
Pass
 
1:
 
Generation of Relocatable Programs
 
symbol table
 
section table
11
 
Transform assembly programs to relocatable programs
Pass
 
2:
 
Generation of Relocation Tables
 
Relocation tables: record the
symbol references
 
Step 1: generate relocation entry
Step 2: update the instruction
 
In the linking process:
Linker follows the relocation tables
to replace the zero with the
concrete address
 
counter
 
counter
 
incr
12
Pass 3: Instruction and Data Encoding
 
Encode the instructions and data into bytes
13
Pass 4: Generation of ELF
14
Pass 4: Generation of ELF
 
Section headers
ELF header
 
ELF object file
15
Verification
16
Key Features
 
 
 
 
 
 
 
 
 
 
A
b
s
t
r
a
c
t
i
o
n
 
o
v
e
r
 
a
r
c
h
i
t
e
c
t
u
r
e
-
d
e
p
e
n
d
e
n
t
 
a
s
s
e
m
b
l
y
:
A
b
s
t
r
a
c
t
i
o
n
 
f
o
r
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
A
b
s
t
r
a
c
t
i
o
n
 
f
o
r
 
v
e
r
i
f
i
c
a
t
i
o
n
A
u
t
o
m
a
t
i
o
n
 
o
f
 
t
h
e
 
i
n
s
t
r
u
c
t
i
o
n
 
e
n
c
o
d
i
n
g
:
I
n
t
e
g
r
a
t
i
o
n
 
o
f
 
t
h
e
 
C
S
L
E
D
 
f
r
a
m
e
w
o
r
k
17
 
Framework + Abstraction interfaces
Abstraction for Implementation: Pass 1
18
symbol table
section table
Abstraction for Implementation: Pass 2
gen_reloc
19
 
counter
 
counter
 
incr
Key Features
 
 
 
 
 
 
 
 
 
 
 
A
b
s
t
r
a
c
t
i
o
n
 
o
v
e
r
 
a
r
c
h
i
t
e
c
t
u
r
e
-
d
e
p
e
n
d
e
n
t
 
a
s
s
e
m
b
l
y
:
A
b
s
t
r
a
c
t
i
o
n
 
f
o
r
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
A
b
s
t
r
a
c
t
i
o
n
 
f
o
r
 
v
e
r
i
f
i
c
a
t
i
o
n
A
u
t
o
m
a
t
i
o
n
 
o
f
 
t
h
e
 
i
n
s
t
r
u
c
t
i
o
n
 
e
n
c
o
d
i
n
g
:
I
n
t
e
g
r
a
t
i
o
n
 
o
f
 
t
h
e
 
C
S
L
E
D
 
f
r
a
m
e
w
o
r
k
20
Framework + Abstraction interfaces
Abstraction for Verification: Pass 1
21
 
Prove
 
the injection holds incrementally
following the allocation order 
 
Our approach: directly construct the final
injection and prove it holds after the
memory initialization 
 
source
memory
 
target
memory
Abstraction for Verification: Pass 1
22
Abstraction for Verification: Pass 2
23
 
counter
 
counter
 
incr
Key Features
 
 
 
 
 
 
 
 
 
 
 
A
b
s
t
r
a
c
t
i
o
n
 
o
v
e
r
 
a
r
c
h
i
t
e
c
t
u
r
e
-
d
e
p
e
n
d
e
n
t
 
a
s
s
e
m
b
l
y
:
A
b
s
t
r
a
c
t
i
o
n
 
f
o
r
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
A
b
s
t
r
a
c
t
i
o
n
 
f
o
r
 
v
e
r
i
f
i
c
a
t
i
o
n
A
u
t
o
m
a
t
i
o
n
 
o
f
 
t
h
e
 
i
n
s
t
r
u
c
t
i
o
n
 
e
n
c
o
d
i
n
g
:
I
n
t
e
g
r
a
t
i
o
n
 
o
f
 
t
h
e
 
C
S
L
E
D
 
f
r
a
m
e
w
o
r
k
24
Framework + Abstraction interfaces
Integration of the CSLED Framework
 
encoder
CSLED
framework
 
Interfaces of this pass
CSLED
framework
 
 
I
n
s
t
r
u
c
t
i
o
n
s
p
e
c
i
f
i
c
a
t
i
o
n
s
 
encoder
 
translator
25
Instruction Specifications – X86
26
 
token Prefix = (8 bits);
token Opcode = (8 bits);
token ModRM = (8 bits);
token SIB = (8 bits);
token Disp = (32 bits);
...
 
field mod = ModRM(7:6);
field reg_op = ModRM(5:3);
field rm = ModRM(2:0);
...
 
class Instruction =
|mov_rr [arg1, arg2] (
Opcode
 = 0x8b ; 
mod
 = 0x3 & 
reg_op 
= arg1 & 
rm
 = arg2)
 
Token
 
Field
 
|add_EvGv [reg_op, AddrE] (
Opcode
 = 0x1 ; 
fld
 %1 & 
cls
 %2)
|addl_mi [AddrE, imm32] (
Opcode
 = 0x80; 
reg_op 
= 0x0 & 
cls
 %1; 
fld
 %2)
|imull_ri [reg_op, rm, imm32] (
Opcode
 = 0x69 ; 
mod
 = 0x3 & 
fld
 %1 & 
fld
 %2; 
fld
 %3)
...
Evaluation of the Framework
 
Applications:
Support X86 and RISC-V 32/64
 
Connect the verified assemblers with
CompCert’s backend
 
Performance of generated code:
1.1
%
 slower on average than the code
generated by the GNU assembler
Other
implementation
and proofs
27
28
Evaluation of the Framework
 
Statistics of development
Framework: 
11k
 LOC
 
X86: 
6.4k
 LOC
 
     RISC-V: 
5.4k
 LOC
44% of the architecture-dependent code is the translators
Possible solution: leverage automation
 
Comparison with CompCertELF
We support X86-32/64 and RISC-V 32/64 in CompCert
In instruction encoding, CompCertELF takes 
3x
 verification effort than ours
 
Generality of the Framework
 
Abstraction is general
 
 
Semantics definition is general
 
 
 
 
 
Framework does not change in 
more realistic ISA semantics (e.g., Sail)
Some disassembly functions are necessary (e.g., decoder and loader)
Verification is generalized to step simulation proofs
29
Limitations
 
The target printer is unverfied
 
 
 
Difficulty: the mismatch between the CompCert and real memory models
 
Does not support separate compilation: the linker is unverified
 
30
Conclusion and Future Work
 
This work: verified assembler framework for multiple architectures
The whole framework and applications are formalized in Coq
This work is open source:
https://zenodo.org/records/8363543
 
Future work:
Verify the target printer: some concrete memory model
Support verified compositional compilation and verified linker
 
 
31
 
Thank you for your listening!
 
32
Slide Note
Embed
Share

This research paper discusses the importance of verified assemblers in the context of verified compilation, focusing on the development of verified assemblers for the ELF format for multiple architectures like X86, RISC-V, and ARM. The framework aims to be configurable, extensible, and general to support different architectures and features.

  • Verified Assemblers
  • ELF Format
  • Compilation
  • Architecture Support
  • Configurability

Uploaded on Sep 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Towards a Framework for Developing Verified Assemblers for the ELF Format Jinhua Wu1, Yuting Wang*1, Meng Sun1, Xiangzhe Xu2, and Yichen Song1 1 Shanghai Jiao Tong University, Shanghai, China 2 Purdue University, West Lafayette, USA APLAS 2023 2023.11.28 1

  2. Motivation Verified assemblers: the last mile in the verified compilation Verified compiler Verified assembler Assembly C programs Object files CompCert Significance of the complete verified compilation: End-to-end behavior preservation ?? (??) ?? (????) ?? (??) Important in building verified systems software stacks Verified Software Toolchain Certified Abstraction Layer in CertiKOS ... 2

  3. Motivation Verified assemblers for Multiple architectures: X86, RISC-V, ARM, ... Standard object file formats: ELF, PE, ... GNU assembler X86 X86 Assembly ELF Object files CompCert GNU assembler RISC-V Assembly RISC-V Assembler framework: implementation verification C programs ELF Object files GNU assembler ARM Assembly ARM ELF Object files Verified assembler framework should be: Configurable: support different architectures Extensible: easy to extend to a new architecture General: reusable when supporting more features 3

  4. Related Work: Verified Assemblers CompCertELF: A prototype of verified assemblers for ELF CompCertELF assembler: Only supports a small subset of X86-32 Requires large amount of work to implement X86-32 ELF object C Assembly Stack-Aware CompCert X86-32 Assembler CakeML Backend: Labelled assembly language LabLang: Not a standard object file format No information for symbols relocation ARM LabLang X86 CakeML LabLang LabLang CakeML Compiler Encode ... 4

  5. Problems and Challenges: Problem 1: large amount of work to support multiple architectures Avoid redundant implementation and verification for each architecture assembler 1 X86 X86 Object files Assembly verified assembler framework assembler 2 RISC-V Object files RISC-V Assembly assembler 3 ARM Assembly ARM Object files Challenge 1: how to design the verified assembler framework ? 5

  6. Problems and Challenges: Problem 2: high complexity in instruction encoding and verification movl 0, %eax movl %eax, %eax movl (%eax), %eax movl %eax, (%eax) movl $0, %eax ... 8b 05 00 00 00 00 89 c0 67 8b 00 67 89 00 b8 00 00 00 00 ... encode decode Impractical to manually implement and verify instruction encoding Challenge 2: How to simplify the instruction encoding? 6

  7. Our Contributions A framework for developing verified assemblers for: Multiple architectures: X86-32/64 and RISC-V 32/64 Standard object file format: ELF format X86 ELF Object files X86 Assembly Verified Assembler Framework RISC-V ELF Object files RISC-V Assembly ... ... Key features: 1. Abstraction over architecture-dependent assembly X86 Verified RISC-V Verified Verified Assembler Assembler Assembler Architecture-dependent Assembly Verified Assembler Framework = = = X86 implementation RISC-V implementation 2. Integration of automated and verified instruction encoding 7

  8. The Rest of This Talk Overview of the framework Key features Evaluation Limitations 8

  9. Overview of the Framework Goal: Transforms assembly programs to ELF object files ?0: Assembly programs 1: Generation of Relocatable Programs Ingredients: IR: relocatable programs (?1 to ?3) ?1 2: Generation of Relocation Tables Four passes: 1 to 4 ?2 3: Instruction and Data Encoding Correctness theorem: 4 3 2 1?0 = ?4 ?0 ? ?4 ?3 4: Generation of Relocatable ELF ?4: ELF object 9

  10. A Running Example ?0: Assembly programs Assembly program 1: Generation of Relocatable Programs variable counter: data = 0 /* main.c */ int counter = 0; ?1 function incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... CompCert and Target Printer 2: Generation of Relocation Tables void incr(){ counter++; } ?2 3: Instruction and Data Encoding int main(){ incr(); } function main: ... call incr ... ?3 4: Generation of Relocatable ELF ?4: ELF object 10

  11. Pass 1: Generation of Relocatable Programs Transform assembly programs to relocatable programs ?0: Assembly programs ?1: Relocatable Programs section table symbol table variable counter: data = 0 data section counter: data = 0 type size section function incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... code secion incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... counter data 4 counter 1 incr function 52 incr main function 40 main code secion main: ... call incr ... code secion main: ... call incr ... 11

  12. Pass 2: Generation of Relocation Tables Relocation tables: record the symbol references ?1: Relocatable Programs ?2: Relocatable Programs section table relocation tables data section counter: data = 0 Step 1: generate relocation entry Step 2: update the instruction code secion incr: ... movl , %eax leal 1(%eax), %eax movl %eax, ... incr counter 0 FFFF0734 In the linking process: Linker follows the relocation tables to replace the zero with the concrete address entry1: entry2: counter 0 FFFF0734 code section main: ... call ... main incr 0 FFFF11CB entry1: 12

  13. Pass 3: Instruction and Data Encoding Encode the instructions and data into bytes section table relocation tables data section counter: 00 data = 0 code secion incr: ... 8b 05 00 00 00 00 67 8d 40 01 89 05 00 00 00 00 movl %eax, 0 incr movl 0, %eax leal 1(%eax), %eax entry1:counter entry2:counter ... code secion main: ... e8 00 00 00 00 call 0 main entry1:incr ... 13

  14. Pass 4: Generation of ELF relocation tables symbol table section table data section counter: 00 code secion incr: ... 8b 05 00 00 00 00 67 8d 40 01 89 05 00 00 00 00 00 00 00 01 01 0F 00 00 incr type 00 size 01 section 00 02 entry1:counter entry2:counter 01 1F B3 00 counter 00 data 00 4 04 counter 00 00 incr 01 function 01 52 44 incr 00 01 ... main 02 function 01 40 34 main 00 02 code secion main: ... e8 00 00 00 00 00 00 00 02 main entry1:incr 01 FF 3C 01 ... 14

  15. Pass 4: Generation of ELF ELF object file ELF header data section counter code section incr code section main symbol table relocation table incr relocation table main Section headers 15

  16. Verification Semantics model: labelled transition system Semantics definitions: _?, _ and _ step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? initial state final state single step Correctness theorem: lock-step forward simulation ~: the relation ~ satisfies the following diagram step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? ~ ~ ~ ~ step step step final init ... ,?0 ,?1 ,?2 ,?? ??0 ??1 ??2 ??? initial simulation step simulation final simulation 16

  17. Key Features Abstraction over architecture-dependent assembly: ?0: Assembly programs ?0: Assembly programs 1: Generation of Relocatable Programs 1: Generation of Relocatable Programs ?1 Verified Assembler Architecture-dependent Verified Assembler Framework Framework + Abstraction interfaces Assembly = ?1 2: Generation of Relocation Tables Abstraction for implementation Abstraction for verification 2: Generation of Relocation Tables gen_reloc ?2 ?2 3: Instruction and Data Encoding 3: Instruction and Data Encoding Automation of the instruction encoding: Integration of the CSLED framework ?3 CSLED encoder 4: Generation of ELF object files ?3 ?4: ELF object files 4: Generation of ELF object file ?4: ELF object file 17

  18. Abstraction for Implementation: Pass 1 1: Instr Data, (asm_prog Instr Data) (reloc_prog Instr Data) ?0: Assembly programs ?1: Relocatable Programs section table symbol table variable counter: data = 0 data section counter: data = 0 type size section function incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... code secion incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... 0 counter data 4 counter incr function 52 incr main function 40 main code secion main: ... call incr ... code secion main: ... call incr ... 18

  19. Abstraction for Implementation: Pass 2 Abstraction interfaces: Interface of 2 is gen_reloc: Z Instr gen_reloc reloc_entry Instr ?1: Relocatable Programs ?2: Relocatable Programs section table relocation tables data section counter: data = 0 ?1: Relocatable programs code secion incr: ... movl , %eax leal 1(%eax), %eax movl %eax, ... 2: Generation of Relocation Tables incr counter 0 entry1: entry2: gen_reloc counter 0 ?2: Relocatable programs code secion main: ... call ... main incr 0 entry1: 19

  20. Key Features Abstraction over architecture-dependent assembly: ?0: Assembly programs 1: Generation of Relocatable Programs Framework + Abstraction interfaces ?1 2: Generation of Relocation Tables Abstraction for implementation gen_reloc Abstraction for verification ?2 3: Instruction and Data Encoding Automation of the instruction encoding: Integration of the CSLED framework CSLED encoder ?3 4: Generation of ELF object files ?4:ELF object files 20

  21. Abstraction for Verification: Pass 1 Correctness theorem: 1?0 = ?1 ?0 ? ~ ?1 step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? ~ ~ ~ ~ step step step final init ... ,?0 ,?1 ,?2 ,?? ??0 ??1 ??2 ??? step simulation initial simulation Difficulty of initial simulation: reorder of memory allocation in initial states b1 b2 b3 Prove the injection holds incrementally following the allocation order source memory incr counter main injection Our approach: directly construct the final injection and prove it holds after the memory initialization target memory incr main counter b1 b2 b3 allocation order 21

  22. Abstraction for Verification: Pass 1 Correctness theorem: 1?0 = ?1 ?0 ? ~ ?1 step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? ~ ~ ~ ~ step step step final init ... ,?0 ,?1 ,?2 ,?? ??0 ??1 ??2 ??? step simulation Step simulation is the abstraction interface in the verification It is the general interface for the verification Specific interfaces are used in the other passes 22

  23. Abstraction for Verification: Pass 2 Consider the verification of 2 ?2: Relocatable Programs section table relocation tables Semantics of ?2 Disassembly: restore_symb data section counter: data = 0 code secion incr: ... movl , %eax leal 1(%eax), %eax movl %eax, ... incr counter counter Step simulation reduces to: Lemma: i i o e, gen_reloc(o,i)=(e,i ) restore_symb(i ,e) = i entry1: entry2: 0 0 main code secion main: ... call ... incr entry1: Interface for the verification of 2 0 23

  24. Key Features Abstraction over architecture-dependent assembly: ?0: Assembly programs 1: Generation of Relocatable Programs Framework + Abstraction interfaces ?1 2: Generation of Relocation Tables Abstraction for implementation gen_reloc Abstraction for verification ?2 3: Instruction and Data Encoding Automation of the instruction encoding: Integration of the CSLED framework CSLED encoder ?3 4: Generation of ELF object files ?4:ELF object files 24

  25. Integration of the CSLED Framework Interfaces of this pass translator: instr csled_instr encoder: csled_instr list byte encoder section table translator csled_instr instr encoder: instr list byte data section counter: data = 0 00 ?2: Relocatable programs Instruction specifications code secion incr: ... movl 0, %eax leal 1(%eax), %eax movl %eax, 0 89 05 00 00 00 00 3: Instruction and Data Encoding 8b 05 00 00 00 00 67 8d 40 01 translator CSLED framework framework CSLED CSLED encoder ... code secion main: ... call 0 e8 00 00 00 00 ?3: Relocatable programs decoder proofs encoder ... 25

  26. Instruction Specifications X86 token Prefix = (8 bits); token Opcode = (8 bits); token ModRM = (8 bits); token SIB = (8 bits); token Disp = (32 bits); ... Token Prefix Opcode ModRM SIB Disp field mod = ModRM(7:6); field reg_op = ModRM(5:3); field rm = ModRM(2:0); ... Field mod reg_op rm Disp Prefix Opcode SIB class Instruction = |mov_rr [arg1, arg2] (Opcode = 0x8b ; mod = 0x3 & reg_op = arg1 & rm = arg2) |add_EvGv [reg_op, AddrE] (Opcode = 0x1 ; fld %1 & cls %2) |addl_mi [AddrE, imm32] (Opcode = 0x80; reg_op = 0x0 & cls %1; fld %2) |imull_ri [reg_op, rm, imm32] (Opcode = 0x69 ; mod = 0x3 & fld %1 & fld %2; fld %3) ... 26

  27. Evaluation of the Framework ?0: Assembly programs 1: Generation of Relocatable Programs Applications: Support X86 and RISC-V 32/64 ?1 Architecture-dependent Components 2: Generation of Relocation Tables Connect the verified assemblers with CompCert s backend Generation of relocation entry gen_reloc ?2 Performance of generated code: 1.1% slower on average than the code generated by the GNU assembler 3: Instruction and Data Encoding Instruction translator translator Instruction specification CSLED encoder ?3 Other 4: Generation of Relocatable ELF implementation and proofs ?4: Relocatable ELF 27

  28. Evaluation of the Framework Statistics of development Framework: 11k LOC 44% of the architecture-dependent code is the translators Possible solution: leverage automation X86: 6.4k LOC RISC-V: 5.4k LOC Comparison with CompCertELF We support X86-32/64 and RISC-V 32/64 in CompCert In instruction encoding, CompCertELF takes 3x verification effort than ours CompCertELF Our framework LOC for each instruction 72 19 28

  29. Generality of the Framework Abstraction is general Architecture 1 Assembler 1 Framework Architecture 2 ... Assembler 2 ... Semantics definition is general 1 2 3 4 ?3: Relocatable programs ?4: ELF object files ?0: Assembly programs ?1: Relocatable programs ?2: Relocatable programs loader loader restore_symb ~ decoder ?3 ~ ~ ?1 ?0 ? ?2 Framework does not change in more realistic ISA semantics (e.g., Sail) Some disassembly functions are necessary (e.g., decoder and loader) Verification is generalized to step simulation proofs 29

  30. Limitations The target printer is unverfied Object files (X86 and RISC-V) Asm text C Asm Stack-Aware CompCert Assembler Framework Target Printer Difficulty: the mismatch between the CompCert and real memory models Does not support separate compilation: the linker is unverified = a.s b.s ab.s = a.o b.o ab.o 30

  31. Conclusion and Future Work This work: verified assembler framework for multiple architectures The whole framework and applications are formalized in Coq This work is open source: https://zenodo.org/records/8363543 Future work: Verify the target printer: some concrete memory model Support verified compositional compilation and verified linker 31

  32. Thank you for your listening! 32

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#