Superoptimization: Accelerating Code Performance through Conditional Correctness

Slide Note

Explore the concept of superoptimization, a technique to generate optimal code implementations for performance-critical systems. The process involves enumerating all possible programs, transforming them with loops, and proving equivalence with the original code. While optimizations are formally verified, challenges exist in discarding useful optimizations and handling corner cases. Conditional equivalence is highlighted as key to achieving the fastest code for optimal system performance.

kref_ani Follow

Uploaded on Oct 10, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Conditionally Correct Superoptimization Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken (Stanford University)

Performance Many systems where code performance matters Compute-bound Repeatedly executed Scientific computing Graphics Low-latency server code Encryption/decryption 2

x86 optimizations are hard x86 is a CISC instruction set Many complicated instructions E.g., dpp, popcnt, crc, ~2000 instruction variants 3439 page manual x86 optimizations require expert knowledge Experts take hours, days, or even weeks For each new processor 3

Superoptimization Generate the optimum implementation Massalin [ASPLOS 87], Bansal and Aiken [ASPLOS 06] Enumerate all possible straight-line programs STOKE, Schkufza, Sharma, Aiken [ASPLOS 13] Random enumeration instead of exhaustive Transforms programs with loops Prove optimized program is equivalent to the original 4

Architecture Arbitrary Programs Fast Correct Program Correct Programs Enumerator Perf. Tests CHECKER Massalin 87 checks correctness using tests Bansal and Aiken 06 use a SAT solver STOKE proves equivalence using DDEC [OOPSLA 13] 5

Verification Pros: Optimizations are formally verified Cons: Useful optimizations can be discarded Equivalence checker rejects many good programs Work perfectly with the application Fail on some weird corner case that cannot arise 6

Example Target Expert Rewrite foo(int32_t* a, int32_t* b) *a++=*b++; *a++=*b++; *a++=*b++; *a++=*b++; return; MOVAPD xmm0, [b] MOVAPD [a], xmm0 RET Incorrect, if a and b overlap Segfault if a and b are not 16 byte aligned 7

Conditions Cannot use the fast rewrite in an arbitrary context Developers know conditions on the context Used in adding unchecked annotations, writing assembly Unconditionally correct optimizations are effective But fall short of the fastest code desired For performance, we need conditional equivalence 8

Conditional correctness Target Rewrite foo(int32_t* a, int32_t* b) *a++=*b++; *a++=*b++; *a++=*b++; *a++=*b++; return; MOVAPD xmm0, [b] MOVAPD [a], xmm0 RET Conditionally correct restrict(a), restrict(b) a, b are 16 byte aligned 9

Conditionally correct optimizer Conditionally Correct Programs Random Programs Fast Conditionally Correct Program STOKE CHECKER Tests Condition ? 10

Conditionally correct optimizer Conditionally Correct Programs Random Programs Fast Conditionally Correct Program STOKE CHECKER Tests Data! Condition ? 11

Condition inference Tests encode the conditions on contexts implicitly Learn facts about tests (Daikon style) Keep adding facts until the proof succeeds Aliasing: two memory dereferences do not alias Alignment: an address is x-byte aligned Equalities, inequalities, floating-point specific 12

Equivalence checking (loop-free) Memory Registers ?: ?? ?:?? Memory Registers Condition C Combined program assume initial states are equal assume C ??;??; assert live outs are equal Verify assertions Equivalence 13

Equivalence checking (loops) ?:? ??? ?? ?? ?? ?:? ??? ?? ?? ?? Memory Registers Memory Registers Combined program assume initial states are equal assume condition C assert ??= ?? while ?? do ??;??; assert ??= ?? assert live outs are equal Verify assertions Equivalence 14

Abstract interpretation Abstract domains: equalities, intervals, alignment Too many instructions! Use RSY 04 to avoid writing abstract transformers Only need to specify the concrete semantics Too many fixpoint iterations! Gather states reaching the loops in a set ? Start fixpoint iterations from ?(?) instead of [YBS 06] 15

Conditions 1. Analyze A to verify C Application (A) Condition (C) 2. Manually verify C Target (T) Rewrite (R) 3. Runtime check if (C) Yes No R T 16

Conditional GCC Relative Performance 5 annot. gcc -O3 gcc -O3 4 3 2 1 0 raysphere smul test10 test13 test17 diag4 diag6 diag8 ex3a ex3b test1 test6 kmeans delta add dot moss ex3c Annotations can help a compiler significantly 17

Conditional STOKE Relative Performance 5 annot. gcc -O3 gcc -O3 cSTOKE 4 3 2 1 0 raysphere smul test10 test13 test17 diag4 diag6 diag8 ex3a ex3b test1 test6 kmeans delta add dot moss ex3c STOKE with condition inference is comparable or outperforms gcc with or without annotations 18

Runtime checks Relative Performance 5 annot. gcc -O3 CHECK cSTOKE gcc -O3 4 3 2 1 0 raysphere smul test10 test13 test17 diag4 diag6 diag8 ex3a ex3b test1 test6 kmeans delta add dot moss ex3c Overhead of condition checking can be bearable 19

Aliasing Restricting aliasing results in simpler queries For equivalence, ? = ? ? ? Byte addressable memory, vector instructions Up to 1030 fewer verification cases! All verifications in under a second 20

Conclusion Optimizers are conservative about context Automatically infer preconditions from tests Formally verify conditional equivalence Significant performance improvements 21

Related work ParaScope [PPoPP 93], SUIF Explorer [PPoPP 99], IOPT [OOPSLA 09], M. Kawaguchi, S. K. Lahiri, and H. Reblo: Conditional equivalence. TR, MSR, 2010. M. D. Ernst et al. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program., 2007 G. Yorsh, T. Ball, and M. Sagiv: Testing, abstraction, theorem proving: better together! ISSTA 2006 George C. Necula: Translation validation for an optimizing compiler. PLDI 2000 Henry Massalin: Superoptimizer - A Look at the Smallest Program. ASPLOS 1987 22

Superoptimization: Accelerating Code Performance through Conditional Correctness

Download Presentation

Presentation Transcript

Related

More Related Content