Framework for Developing Verified Assemblers for ELF Format

Slide Note
Embed
Share

This research paper discusses the importance of verified assemblers in the context of verified compilation, focusing on the development of verified assemblers for the ELF format for multiple architectures like X86, RISC-V, and ARM. The framework aims to be configurable, extensible, and general to support different architectures and features.


Uploaded on Sep 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Towards a Framework for Developing Verified Assemblers for the ELF Format Jinhua Wu1, Yuting Wang*1, Meng Sun1, Xiangzhe Xu2, and Yichen Song1 1 Shanghai Jiao Tong University, Shanghai, China 2 Purdue University, West Lafayette, USA APLAS 2023 2023.11.28 1

  2. Motivation Verified assemblers: the last mile in the verified compilation Verified compiler Verified assembler Assembly C programs Object files CompCert Significance of the complete verified compilation: End-to-end behavior preservation ?? (??) ?? (????) ?? (??) Important in building verified systems software stacks Verified Software Toolchain Certified Abstraction Layer in CertiKOS ... 2

  3. Motivation Verified assemblers for Multiple architectures: X86, RISC-V, ARM, ... Standard object file formats: ELF, PE, ... GNU assembler X86 X86 Assembly ELF Object files CompCert GNU assembler RISC-V Assembly RISC-V Assembler framework: implementation verification C programs ELF Object files GNU assembler ARM Assembly ARM ELF Object files Verified assembler framework should be: Configurable: support different architectures Extensible: easy to extend to a new architecture General: reusable when supporting more features 3

  4. Related Work: Verified Assemblers CompCertELF: A prototype of verified assemblers for ELF CompCertELF assembler: Only supports a small subset of X86-32 Requires large amount of work to implement X86-32 ELF object C Assembly Stack-Aware CompCert X86-32 Assembler CakeML Backend: Labelled assembly language LabLang: Not a standard object file format No information for symbols relocation ARM LabLang X86 CakeML LabLang LabLang CakeML Compiler Encode ... 4

  5. Problems and Challenges: Problem 1: large amount of work to support multiple architectures Avoid redundant implementation and verification for each architecture assembler 1 X86 X86 Object files Assembly verified assembler framework assembler 2 RISC-V Object files RISC-V Assembly assembler 3 ARM Assembly ARM Object files Challenge 1: how to design the verified assembler framework ? 5

  6. Problems and Challenges: Problem 2: high complexity in instruction encoding and verification movl 0, %eax movl %eax, %eax movl (%eax), %eax movl %eax, (%eax) movl $0, %eax ... 8b 05 00 00 00 00 89 c0 67 8b 00 67 89 00 b8 00 00 00 00 ... encode decode Impractical to manually implement and verify instruction encoding Challenge 2: How to simplify the instruction encoding? 6

  7. Our Contributions A framework for developing verified assemblers for: Multiple architectures: X86-32/64 and RISC-V 32/64 Standard object file format: ELF format X86 ELF Object files X86 Assembly Verified Assembler Framework RISC-V ELF Object files RISC-V Assembly ... ... Key features: 1. Abstraction over architecture-dependent assembly X86 Verified RISC-V Verified Verified Assembler Assembler Assembler Architecture-dependent Assembly Verified Assembler Framework = = = X86 implementation RISC-V implementation 2. Integration of automated and verified instruction encoding 7

  8. The Rest of This Talk Overview of the framework Key features Evaluation Limitations 8

  9. Overview of the Framework Goal: Transforms assembly programs to ELF object files ?0: Assembly programs 1: Generation of Relocatable Programs Ingredients: IR: relocatable programs (?1 to ?3) ?1 2: Generation of Relocation Tables Four passes: 1 to 4 ?2 3: Instruction and Data Encoding Correctness theorem: 4 3 2 1?0 = ?4 ?0 ? ?4 ?3 4: Generation of Relocatable ELF ?4: ELF object 9

  10. A Running Example ?0: Assembly programs Assembly program 1: Generation of Relocatable Programs variable counter: data = 0 /* main.c */ int counter = 0; ?1 function incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... CompCert and Target Printer 2: Generation of Relocation Tables void incr(){ counter++; } ?2 3: Instruction and Data Encoding int main(){ incr(); } function main: ... call incr ... ?3 4: Generation of Relocatable ELF ?4: ELF object 10

  11. Pass 1: Generation of Relocatable Programs Transform assembly programs to relocatable programs ?0: Assembly programs ?1: Relocatable Programs section table symbol table variable counter: data = 0 data section counter: data = 0 type size section function incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... code secion incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... counter data 4 counter 1 incr function 52 incr main function 40 main code secion main: ... call incr ... code secion main: ... call incr ... 11

  12. Pass 2: Generation of Relocation Tables Relocation tables: record the symbol references ?1: Relocatable Programs ?2: Relocatable Programs section table relocation tables data section counter: data = 0 Step 1: generate relocation entry Step 2: update the instruction code secion incr: ... movl , %eax leal 1(%eax), %eax movl %eax, ... incr counter 0 FFFF0734 In the linking process: Linker follows the relocation tables to replace the zero with the concrete address entry1: entry2: counter 0 FFFF0734 code section main: ... call ... main incr 0 FFFF11CB entry1: 12

  13. Pass 3: Instruction and Data Encoding Encode the instructions and data into bytes section table relocation tables data section counter: 00 data = 0 code secion incr: ... 8b 05 00 00 00 00 67 8d 40 01 89 05 00 00 00 00 movl %eax, 0 incr movl 0, %eax leal 1(%eax), %eax entry1:counter entry2:counter ... code secion main: ... e8 00 00 00 00 call 0 main entry1:incr ... 13

  14. Pass 4: Generation of ELF relocation tables symbol table section table data section counter: 00 code secion incr: ... 8b 05 00 00 00 00 67 8d 40 01 89 05 00 00 00 00 00 00 00 01 01 0F 00 00 incr type 00 size 01 section 00 02 entry1:counter entry2:counter 01 1F B3 00 counter 00 data 00 4 04 counter 00 00 incr 01 function 01 52 44 incr 00 01 ... main 02 function 01 40 34 main 00 02 code secion main: ... e8 00 00 00 00 00 00 00 02 main entry1:incr 01 FF 3C 01 ... 14

  15. Pass 4: Generation of ELF ELF object file ELF header data section counter code section incr code section main symbol table relocation table incr relocation table main Section headers 15

  16. Verification Semantics model: labelled transition system Semantics definitions: _?, _ and _ step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? initial state final state single step Correctness theorem: lock-step forward simulation ~: the relation ~ satisfies the following diagram step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? ~ ~ ~ ~ step step step final init ... ,?0 ,?1 ,?2 ,?? ??0 ??1 ??2 ??? initial simulation step simulation final simulation 16

  17. Key Features Abstraction over architecture-dependent assembly: ?0: Assembly programs ?0: Assembly programs 1: Generation of Relocatable Programs 1: Generation of Relocatable Programs ?1 Verified Assembler Architecture-dependent Verified Assembler Framework Framework + Abstraction interfaces Assembly = ?1 2: Generation of Relocation Tables Abstraction for implementation Abstraction for verification 2: Generation of Relocation Tables gen_reloc ?2 ?2 3: Instruction and Data Encoding 3: Instruction and Data Encoding Automation of the instruction encoding: Integration of the CSLED framework ?3 CSLED encoder 4: Generation of ELF object files ?3 ?4: ELF object files 4: Generation of ELF object file ?4: ELF object file 17

  18. Abstraction for Implementation: Pass 1 1: Instr Data, (asm_prog Instr Data) (reloc_prog Instr Data) ?0: Assembly programs ?1: Relocatable Programs section table symbol table variable counter: data = 0 data section counter: data = 0 type size section function incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... code secion incr: ... movl counter, %eax leal 1(%eax), %eax movl %eax, counter ... 0 counter data 4 counter incr function 52 incr main function 40 main code secion main: ... call incr ... code secion main: ... call incr ... 18

  19. Abstraction for Implementation: Pass 2 Abstraction interfaces: Interface of 2 is gen_reloc: Z Instr gen_reloc reloc_entry Instr ?1: Relocatable Programs ?2: Relocatable Programs section table relocation tables data section counter: data = 0 ?1: Relocatable programs code secion incr: ... movl , %eax leal 1(%eax), %eax movl %eax, ... 2: Generation of Relocation Tables incr counter 0 entry1: entry2: gen_reloc counter 0 ?2: Relocatable programs code secion main: ... call ... main incr 0 entry1: 19

  20. Key Features Abstraction over architecture-dependent assembly: ?0: Assembly programs 1: Generation of Relocatable Programs Framework + Abstraction interfaces ?1 2: Generation of Relocation Tables Abstraction for implementation gen_reloc Abstraction for verification ?2 3: Instruction and Data Encoding Automation of the instruction encoding: Integration of the CSLED framework CSLED encoder ?3 4: Generation of ELF object files ?4:ELF object files 20

  21. Abstraction for Verification: Pass 1 Correctness theorem: 1?0 = ?1 ?0 ? ~ ?1 step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? ~ ~ ~ ~ step step step final init ... ,?0 ,?1 ,?2 ,?? ??0 ??1 ??2 ??? step simulation initial simulation Difficulty of initial simulation: reorder of memory allocation in initial states b1 b2 b3 Prove the injection holds incrementally following the allocation order source memory incr counter main injection Our approach: directly construct the final injection and prove it holds after the memory initialization target memory incr main counter b1 b2 b3 allocation order 21

  22. Abstraction for Verification: Pass 1 Correctness theorem: 1?0 = ?1 ?0 ? ~ ?1 step step step final init ... ??0,?0 ??1,?1 ??2,?2 ???,?? ~ ~ ~ ~ step step step final init ... ,?0 ,?1 ,?2 ,?? ??0 ??1 ??2 ??? step simulation Step simulation is the abstraction interface in the verification It is the general interface for the verification Specific interfaces are used in the other passes 22

  23. Abstraction for Verification: Pass 2 Consider the verification of 2 ?2: Relocatable Programs section table relocation tables Semantics of ?2 Disassembly: restore_symb data section counter: data = 0 code secion incr: ... movl , %eax leal 1(%eax), %eax movl %eax, ... incr counter counter Step simulation reduces to: Lemma: i i o e, gen_reloc(o,i)=(e,i ) restore_symb(i ,e) = i entry1: entry2: 0 0 main code secion main: ... call ... incr entry1: Interface for the verification of 2 0 23

  24. Key Features Abstraction over architecture-dependent assembly: ?0: Assembly programs 1: Generation of Relocatable Programs Framework + Abstraction interfaces ?1 2: Generation of Relocation Tables Abstraction for implementation gen_reloc Abstraction for verification ?2 3: Instruction and Data Encoding Automation of the instruction encoding: Integration of the CSLED framework CSLED encoder ?3 4: Generation of ELF object files ?4:ELF object files 24

  25. Integration of the CSLED Framework Interfaces of this pass translator: instr csled_instr encoder: csled_instr list byte encoder section table translator csled_instr instr encoder: instr list byte data section counter: data = 0 00 ?2: Relocatable programs Instruction specifications code secion incr: ... movl 0, %eax leal 1(%eax), %eax movl %eax, 0 89 05 00 00 00 00 3: Instruction and Data Encoding 8b 05 00 00 00 00 67 8d 40 01 translator CSLED framework framework CSLED CSLED encoder ... code secion main: ... call 0 e8 00 00 00 00 ?3: Relocatable programs decoder proofs encoder ... 25

  26. Instruction Specifications X86 token Prefix = (8 bits); token Opcode = (8 bits); token ModRM = (8 bits); token SIB = (8 bits); token Disp = (32 bits); ... Token Prefix Opcode ModRM SIB Disp field mod = ModRM(7:6); field reg_op = ModRM(5:3); field rm = ModRM(2:0); ... Field mod reg_op rm Disp Prefix Opcode SIB class Instruction = |mov_rr [arg1, arg2] (Opcode = 0x8b ; mod = 0x3 & reg_op = arg1 & rm = arg2) |add_EvGv [reg_op, AddrE] (Opcode = 0x1 ; fld %1 & cls %2) |addl_mi [AddrE, imm32] (Opcode = 0x80; reg_op = 0x0 & cls %1; fld %2) |imull_ri [reg_op, rm, imm32] (Opcode = 0x69 ; mod = 0x3 & fld %1 & fld %2; fld %3) ... 26

  27. Evaluation of the Framework ?0: Assembly programs 1: Generation of Relocatable Programs Applications: Support X86 and RISC-V 32/64 ?1 Architecture-dependent Components 2: Generation of Relocation Tables Connect the verified assemblers with CompCert s backend Generation of relocation entry gen_reloc ?2 Performance of generated code: 1.1% slower on average than the code generated by the GNU assembler 3: Instruction and Data Encoding Instruction translator translator Instruction specification CSLED encoder ?3 Other 4: Generation of Relocatable ELF implementation and proofs ?4: Relocatable ELF 27

  28. Evaluation of the Framework Statistics of development Framework: 11k LOC 44% of the architecture-dependent code is the translators Possible solution: leverage automation X86: 6.4k LOC RISC-V: 5.4k LOC Comparison with CompCertELF We support X86-32/64 and RISC-V 32/64 in CompCert In instruction encoding, CompCertELF takes 3x verification effort than ours CompCertELF Our framework LOC for each instruction 72 19 28

  29. Generality of the Framework Abstraction is general Architecture 1 Assembler 1 Framework Architecture 2 ... Assembler 2 ... Semantics definition is general 1 2 3 4 ?3: Relocatable programs ?4: ELF object files ?0: Assembly programs ?1: Relocatable programs ?2: Relocatable programs loader loader restore_symb ~ decoder ?3 ~ ~ ?1 ?0 ? ?2 Framework does not change in more realistic ISA semantics (e.g., Sail) Some disassembly functions are necessary (e.g., decoder and loader) Verification is generalized to step simulation proofs 29

  30. Limitations The target printer is unverfied Object files (X86 and RISC-V) Asm text C Asm Stack-Aware CompCert Assembler Framework Target Printer Difficulty: the mismatch between the CompCert and real memory models Does not support separate compilation: the linker is unverified = a.s b.s ab.s = a.o b.o ab.o 30

  31. Conclusion and Future Work This work: verified assembler framework for multiple architectures The whole framework and applications are formalized in Coq This work is open source: https://zenodo.org/records/8363543 Future work: Verify the target printer: some concrete memory model Support verified compositional compilation and verified linker 31

  32. Thank you for your listening! 32

More Related Content