Falcon: An Optimizing Java JIT Compiler Overview

undefined
 
Philip Reames
Azul Systems
 
FALCON: AN OPTIMIZING JAVA JIT
 
AGENDA
 
Intro to Falcon
Why you should use LLVM to build a JIT
Common Objections (and why they’re mostly wrong)
 
2
 
WHAT IS FALCON?
 
 
Falcon is an LLVM based just-in-time compiler for Java bytecode.
 
Shipping on-by-default in the Azul Zing JVM.
 
Available for trial download at: www.azul.com/zingtrial/
 
3
 
THIS TALK IS ABOUT LESSONS LEARNED, BOTH
TECHNICAL AND PROCESS
 
4
 
ZING VM BACKGROUND
 
 
A hotspot derived JVM with an awesome GC (off topic)
 
Multiple tiers of execution
 
Interpreter
 
Tier1
 Compiler
 
Tier2
 Compiler
 
Rapidly generated
code to collect
profiling quickly
 
Run once
bytecode and rare
events
 
Compile hot
methods for peak
performance
 
5
 
BUSINESS NEED
 
 
The existing C2 compiler is aging poorly
vectorization (a key feature of modern x86_64) is an afterthought
very complicated codebase; "unpleasant" bug tails the norm
difficult to test in isolation
Looking to establish a competitive advantage
Long term goal is to outperform competition
Velocity of performance improvement is key
 
6
 
DEVELOPMENT TIMELINE
 
 
April 2014
  
Proof of concept completed (in six months)
Feb 2015
   
Mostly functionally complete
April 2016 
  
Alpha builds shared with selected customers
Dec 2016
   
Product GA (off by default)
April 2017
  
On by default
 
Team size: 4-6 developers, ~20 person years invested
 
7
 
TEAM EFFORT
 
Falcon Development
 
Bean Anderson
  
Philip Reames
 
Chen Li
   
Sanjoy Das
 
Igor Laevsky
  
Artur Pilipenko
 
Daniel Neilson
  
Anna Thomas
 
Serguei Katkov
  
Maxim Kazantsev
 
Daniil Suchkov
  
Michael Wolf
 
Leela Venati
  
Kris Mok
 
Nina Rinskaya
+ VM development team
+ Zing QA
+ All Azul (E-Staff, Sales, Support, etc..)
 
8
 
9
 
Various application benchmarks + SPECjvm + Dacapo
Collected on a mix of haswell and skylake machines
 
WHY YOU SHOULD USE LLVM TO BUILD A JIT
 
 
Proven stability, widespread deployments
 
Active developer community, support for new micro-architectures
 
Proven performance (for C/C++)
 
Welcoming to commercial projects
 
10
 
COMMON OBJECTIONS
 
"LLVM doesn't support X“
“LLVM is a huge dependency"
"We added an LLVM backend; it produced poor code"
"LLVM generates too much code"
"LLVM is a slow JIT“
“My language has feature X (which requires a custom compiler)”
 
11
undefined
 
LLVM DOESN'T SUPPORT X
 
Objection 1 of 6
 
12
 
FIRST, A BIT OF SKEPTICISM...
 
13
 
 
Is this something you can express in C?  If so, LLVM supports it.
e.g. deoptimization via spill to captured on-stack buffer
 
buf = alloca(…);
buf[0] = local_0;
a->foo(buf, actual_args…)
 
The real question, is 
how well 
is it supported?
 
FUNCTIONAL CORNER-CASES
 
 
If you can modify your ABI (calling conventions, patching sequences, etc..), your
life will be 
much
 easier.
 
You will find a couple of important hard cases.  Ours were:
“anchoring" for mixed stack walks
red-zone arguments to assembler routines
support for deoptimization both checked and async
GC interop
 
14
 
BE WARY OF OVER DESIGN
 
15
 
Common knowledge that quality of safepoint lowering matters.
 
gc.statepoint design
Major goal: allow in register updates
Topic of 2014 LLVM Dev talk
2+ person years of effort
 
It turns out that hot safepoints are inliner bug.
 
GET TO FUNCTIONAL CORRECTNESS FIRST
 
 
Priority 1: Implement all interesting cases, get real code running
Priority 2: Add tests to show it continues working
 
Design against an adversarial optimizer.
 
Be wary of over-design, while maintaining code quality standards.
 
16
 
See backup slides for more specifics on this topic.
undefined
 
LLVM IS A HUGE DEPENDENCY
 
Objection 2 of 6
 
17
 
18
 
Potentially a real issue, but depends on your definition of large.
 
From our shipping product:
libJVM = ~200mb
libLLVM = ~40mb
 
So, 20% code size increase.
undefined
 
WE ADDED AN LLVM BACKEND;
    IT PRODUCED POOR CODE
 
Objection 3 of 6
 
19
 
IMPORTANCE OF PROFILING
 
 
Tier 1 collects detailed profiles; Tier 2 exploits them
 
25% 
or more 
of peak application performance
 
20
 
PRUNE UNTAKEN PATHS
 
 
Handle rare events by returning to lower tier, reprofiling, and then recompiling.
 
Interpreter
 
Tier1 Code
 
Tier2 Code
 
Rare Events
 
%ret = call i32 @llvm.experimental.deoptimize() ["deopt"(..)]
ret i32 %ret
 
21
 
PREDICATED DEVIRTUALIZATION
 
 
switch (type(o))
case A: A::foo();
case B: B::foo();
default: o->foo();
 
 
Critical for Java, where everything is virtual by default
 
switch (type(o))
case A: A::foo();
case B: B::foo();
default: @deoptimize() [“deopt”(…)]
 
22
 
IMPLICIT NULL CHECKS
 
 
%is.null = icmp eq i8* %p, null
br i1 %is.null, label %handler, label %fallthrough, !make.implicit !{}
 
test rax, rax
jz <handler>
rsi = ld [rax+8]
 
rsi = ld [rax+8]  
 fault_pc
 
 
 __llvm_faultmaps[fault_pc] -> handler
 
handler:
    call @__llvm_deoptimize
 
23
 
LOCAL CODE LAYOUT
 
 
branch_weights for code layout
 
Sources of slow paths:
GC barriers, safepoints
handlers for builtin exceptions
result of code versioning
 
50%+ of total code size
 
Hot
cold
hot
cold
hot
cold
 
24
 
hot
hot
hot
cold
cold
cold
 
GLOBAL CODE LAYOUT
 
 
Best to put cold code into it's own section
 
 
See:
 
LLVM back end for HHVM/PHP,
 
LLVM Dev Conf 2015
 
func1-hot
func1-cold
func2-hot
func2-cold
func3-hot
func3-cold
 
25
 
func1-hot
func2-hot
func3-hot
func1-cold
func2-cold
func3-cold
 
EXPLOITING SEMANTICS
 
 
LLVM supports a huge space of optional annotations
 
Both metadata and attributes
 
~6-12 month effort
 
Performance
Analysis
 
Root Cause
Issue
 
Add
Annotation
 
Fix Uncovered
Miscompiles
 
26
 
DEFINING A CUSTOM PASS ORDER
 
 
 
 
 
 
Requires careful thought and experimentation.
MCJIT's OptLevel is not what you want.
PassManagerBuilder is tuned for C/C++!
 
May expose some pass ordering specific bugs
 
27
 
legacy::PassManager PM;
PM.add(createEarlyCSEPass());
...
PM.run(Module)
 
EXPECT TO BECOME AN LLVM DEVELOPER
 
 
You will uncover bugs, both performance and correctness
 
You
 will need to fix them.
 
Will need a downstream process for incorporating and shipping fixes.
 
28
 
See backup slides for more specifics on this topic.
 
STATUS CHECK
 
We've got a reasonably good compiler for a c-like subset of our source language.
 
We're packaging a modified LLVM library.
 
This is further than most projects get.
 
29
 
QUICK EXAMPLE
 
public int intsMin() {
    int min = Integer.MAX_VALUE;
    for (int i = 0; i < a_I.length; i++) {
       min = min > a_I[i] ? a_I[i] : min;
    }
    return min;
}
 
4x faster 
than competition on Intel Skylake
 
30
undefined
 
"LLVM GENERATES TOO MUCH CODE"
 
Objection 4 of 6
 
31
 
CALLING ALL LLVM DEVELOPERS...
 
32
 
We regularly see 3-5x larger code compared to C2.
 
Partly as a result of aggressive code versioning.
 
But also, failure to exploit:
mix of hot and cold code, need selective Oz
lots and lots of no-return paths
gc.statepoint lowering vs reg-alloc
code versioning via deopt (unswitch, unroll, vectorizer)
undefined
 
LLVM IS A SLOW JIT
 
Objection 5 of 6
 
33
 
34
 
 
Absolutely.  LLVM is 
not
 suitable for 
a first tier 
JIT.
 
That's not what we have or need.
 
We use the term "in memory compiler" to avoid confusion.
 
SYSTEM INTEGRATION
 
 
VM infrastructure moderates impact
The tier 2 compiler sees a small fraction of the code.
Background compilation on multiple compiler threads.
Prioritized queueing of compilation work
Hotness driven fill-in of callee methods
 
This stuff is standard (for JVMs).  Nothing new here.
 
35
 
IN PRACTICE, MOSTLY IGNORE-ABLE
 
 
Typical compile times around 100ms.
 
Extreme cases in the seconds to low minute range.
 
At the extreme, that's (serious) wasted CPU time, but nothing else.
 
36
 
WAIT, YOU DO 
WHAT
?
 
 
There are cases where this isn't good enough.
a popular big-data analytics framework spawns a JVM per query
multi-tenancy environments can spawn 1000s of JVMs at once
 
Improving compile time is hard.  So what do we do?
 
37
 
CACHED COMPILES
 
 
Reuse compiled code across runs.
Profiling continues to apply.
No decrease in peak performance.
 
Planned to ship in a future version of Zing.
 
 
 
Credit for inspiration goes to the Pyston project and their article  "Caching object
code".  https://blog.pyston.org/2015/07/14/caching-object-code
 
38
 
After all, our
"in memory compiler"
does produce normal
object files.
undefined
 
MY LANGUAGE HAS FEATURE X
(WHICH REQUIRES A CUSTOM COMPILER)
 
Objection 6 of 6
 
39
 
LANGUAGE SPECIFIC DEFICIENCIES
 
 
LLVM has been tuned for certain languages
 
The more different your language, the more work needed.
 
For Java, our key performance problems were:
range checks
null checks
devirtualization & inlining
type based optimizations
deoptimization
 
40
 
DO YOU ACTUALLY HAVE A PROBLEM?
 
 
Try the naive version, seriously try it!
 
Very useful to try different C++ implementations
requires good knowledge of Clang
may find a viable implementation strategy
may provide insight into what optimizer finds hard
 
Check to see if existing metadata/attributes are relevant
 
41
 
CUSTOM ATTRIBUTES/METADATA
 
 
Is there a single key missing fact? Or a small set thereof?
 
A lot of advantages:
Factoring for long term branching
May be upstreamable
Easy to functionally
 
test
Serialization
 
42
 
Examples from our tree:
LICM isGuaranteedToExecute
"allocation-site"
"deopt-on-throw"
"known-value"
"java-type"=“XYZ”
 
METADATA HEALING
 
 
Optimizer tends to strip metadata it doesn't understand.
 
Don't try to change this
 
Need a mechanism to "heal" previously stripped metadata
 
43
 
See backup slides for more on this topic
 
CALLBACKS
 
Optional<FieldInfo>
getKnownValueKeyDependent(llvm::LLVMContext &C,
                          KnownValueID BaseObj,
                          int64_t Offset,
                          uint64_t Size);
 
44
 
Narrow interface between optimizer and containing runtime
 
CALLBACKS
 
 
Primarily used for metadata healing
But also useful for type system specific optimizations
 
Resist temptation to add them freely.
Each one should be strongly and continuously justified.
We have 24 total.  And that's too many.
 
Optimizer can visit dead code (Critically Important)
Fuzz this interface.  Seriously, fuzz it!
 
45
 
IMPLICATIONS FOR REPLAY
 
 
Replay is an absolutely essential debugging and diagnostic tool
 
$ opt –O3 -S input.ll
 
To preserve replay w/callbacks, need a query-log mechanism.
 
$ opt –O3 -S input.ll --read-query-log-from-file=input.ll.queries
 
46
 
EMBEDDED HIGH LEVEL IR
 
 
"Abstraction" functions
Opaque by default to the optimizer
Can be first class citizens with builtin knowledge
Useful for building certain classes of high level optimizations
 
47
 
declare i8 addrspace(1)*
@azul.resolve_virtual(i8 addrspace(1)* %receiver,
   
i64 %vtable_index)
nounwind noinline readonly "late-inline"="2"
"gc-leaf-function"
 
EMBEDDED HIGH LEVEL IR
 
 
Enable high level transforms such as:
devirtualization
object elision
allocation sinking & partial escape analysis
lock coarsening & elision
 
Most of the above expressed as custom passes
 
48
 
See backup slides for discussion of the tradeoffs around custom passes
 
A (SLIGHT) CHANGE IN VIEW ON ABSTRACTIONS
 
 
Useful for rapid evolution and prototyping
 
But each is an optimization barrier until lowered
As few as you can reasonably manage long term
Lower early in the pipeline
Minimize number of unique lowering points
 
49
 
CONCLUSION
 
 
 
For the right use case, LLVM can be used to build a
production quality JIT which is performance
competitive with state of the art.
 
50
undefined
 
QUESTIONS?
 
51
undefined
 
BACKUP SLIDES
 
52
 
ASIDE: STAFFING
 
 
Hiring a team from within the existing community is hard, slow, and expensive.
 
Expect to train most of your team; including yourself.
 
53
 
SO IT'S COMPLETE, IS IT CORRECT?
 
 
Design against a adversarial optimizer.
 
Don't fall for "no real compiler would do that".
 
You'll be wrong; if not now, later.
 
54
 
A RECOMMENDATION
 
 
Implement the cases claimed to be unreasonable.  :)
 
55
 
HIGH VALUE TESTING TECHNIQUES
 
 
Compile everything (not standard for high tier compilers) in a very large corpus
w/a Release+Asserts build
1500+ JARs, every method, every class
Finds many subtle invariant violations
 
Randomly generated test cases (Fuzzing) works.
By far cheapest way to find miscompile bugs.
Good way to find regressions quickly.
 
56
undefined
 
57
 
“Be Wary of Overdesign” continued…
 
ONE WORD OF CAUTION
 
 
While ugly design is expected, your code should be production quality and well
tested.
 
Surprisingly large amounts of our initial prototype code made it into the final
product.
 
"of course we'll rewrite this" only works if you have perfect knowledge
 
Still not an excuse for over-design.
 
58
 
MANAGING BRANCHES
 
 
Daily integrations are strongly encouraged.
 
Key considerations:
LLVM evolves quickly, internal APIs are not stable
Isolating regressions
Revert-to-green culture *for recent changes*
 
Can be very time consuming if not carefully arranged.
 
59
 
ONE WORKING BRANCH DESIGN
 
 
pristine -- plain LLVM passing make check (w/our build config)
 
merge-unstable - merged with local changes (conflict resolutions)
 
testing - VM merge and basic tests pass (catches all API changes)
 
merge-stable - last point known to pass all tests
 
master
 
60
 
FRONTEND INTRINSICS
 
Replace body of A::foo with hand written IR
 
Here be Dragons:
Major implications for correctness testing
Too easy to side-step hard issues
 
Hard Lessons:
Express all preconditions in source language
Intrinsify only the critical bit
Be willing to change core libraries early
 
61
 
METADATA HEALING TRADEOFFS
 
 
Pros:
Robust to upstream changes
Robust to pass order changes
Works for custom and upstream metadata
Testable
 
Cons:
Wastes compile time
Can't always recover lost information
 
62
undefined
 
Custom passes continued…
 
63
 
CUSTOM PASSES
 
 
Pros:
Easy to implement
Handle language specific IR patterns
Required for certain high level optimizations
 
Cons:
Simplicity is often deceptive
May cover a hard problem which needs solved
Not in the default pipeline, major testing implications
 
64
 
KEY LESSON: SHORT VS LONG TERM
 
 
Improving existing passes is the right long term answer.
 
Not all problems need long term solutions.
 
A specialized pass generally gets 80% of cases, much sooner.
 
65
 
KEY LESSON: TESTING
 
 
Code that runs rarely is often buggy.  Subtly so.
 
Unit testing only goes so far.
 
Our painful lessons:
Inductive Range Check Elimination (IRCE)
Loop Predication
 
(Only) three options:
Extensive, exhausting testing via fuzzers
Well factored shallow code
Minimize language specific code
 
66
Slide Note
Embed
Share

Explore Falcon, an LLVM-based just-in-time compiler for Java bytecode developed by Azul Systems. Learn why using LLVM to build a JIT compiler is beneficial, address common objections, and dive into the technical and process lessons learned through its development timeline.

  • JIT compiler
  • LLVM
  • Java
  • Azul Systems
  • Development

Uploaded on Sep 06, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. FALCON: AN OPTIMIZING JAVA JIT Philip Reames Azul Systems

  2. AGENDA Intro to Falcon Why you should use LLVM to build a JIT Common Objections (and why they re mostly wrong) 2

  3. WHAT IS FALCON? Falcon is an LLVM based just-in-time compiler for Java bytecode. Shipping on-by-default in the Azul Zing JVM. Available for trial download at: www.azul.com/zingtrial/ 3

  4. THIS TALK IS ABOUT LESSONS LEARNED, BOTH TECHNICAL AND PROCESS 4

  5. ZING VM BACKGROUND A hotspot derived JVM with an awesome GC (off topic) Multiple tiers of execution Tier1 Compiler Tier2 Compiler Interpreter Rapidly generated code to collect profiling quickly Run once bytecode and rare events Compile hot methods for peak performance 5

  6. BUSINESS NEED The existing C2 compiler is aging poorly vectorization (a key feature of modern x86_64) is an afterthought very complicated codebase; "unpleasant" bug tails the norm difficult to test in isolation Looking to establish a competitive advantage Long term goal is to outperform competition Velocity of performance improvement is key 6

  7. DEVELOPMENT TIMELINE April 2014 Proof of concept completed (in six months) Feb 2015 Mostly functionally complete April 2016 Alpha builds shared with selected customers Dec 2016 Product GA (off by default) April 2017 On by default Team size: 4-6 developers, ~20 person years invested 7

  8. TEAM EFFORT Falcon Development Bean Anderson Chen Li Igor Laevsky Daniel Neilson Serguei Katkov Daniil Suchkov Leela Venati Nina Rinskaya Philip Reames Sanjoy Das Artur Pilipenko Anna Thomas Maxim Kazantsev Michael Wolf Kris Mok + VM development team + Zing QA + All Azul (E-Staff, Sales, Support, etc..) 8

  9. Zing 17.08 vs Oracle 8u121 200% 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% Various application benchmarks + SPECjvm + Dacapo Collected on a mix of haswell and skylake machines 9

  10. WHY YOU SHOULD USE LLVM TO BUILD A JIT Proven stability, widespread deployments Active developer community, support for new micro-architectures Proven performance (for C/C++) Welcoming to commercial projects 10

  11. COMMON OBJECTIONS "LLVM doesn't support X LLVM is a huge dependency" "We added an LLVM backend; it produced poor code" "LLVM generates too much code" "LLVM is a slow JIT My language has feature X (which requires a custom compiler) 11

  12. Objection 1 of 6 LLVM DOESN'T SUPPORT X 12

  13. FIRST, A BIT OF SKEPTICISM... Is this something you can express in C? If so, LLVM supports it. e.g. deoptimization via spill to captured on-stack buffer buf = alloca( ); buf[0] = local_0; a->foo(buf, actual_args ) The real question, is how well is it supported? 13

  14. FUNCTIONAL CORNER-CASES If you can modify your ABI (calling conventions, patching sequences, etc..), your life will be much easier. You will find a couple of important hard cases. Ours were: anchoring" for mixed stack walks red-zone arguments to assembler routines support for deoptimization both checked and async GC interop 14

  15. BE WARY OF OVER DESIGN Common knowledge that quality of safepoint lowering matters. gc.statepoint design Major goal: allow in register updates Topic of 2014 LLVM Dev talk 2+ person years of effort It turns out that hot safepoints are inliner bug. 15

  16. GET TO FUNCTIONAL CORRECTNESS FIRST Priority 1: Implement all interesting cases, get real code running Priority 2: Add tests to show it continues working Design against an adversarial optimizer. Be wary of over-design, while maintaining code quality standards. See backup slides for more specifics on this topic. 16

  17. Objection 2 of 6 LLVM IS A HUGE DEPENDENCY 17

  18. Potentially a real issue, but depends on your definition of large. From our shipping product: libJVM = ~200mb libLLVM = ~40mb So, 20% code size increase. 18

  19. Objection 3 of 6 WE ADDED AN LLVM BACKEND; IT PRODUCED POOR CODE 19

  20. IMPORTANCE OF PROFILING Tier 1 collects detailed profiles; Tier 2 exploits them 25% or more of peak application performance 20

  21. PRUNE UNTAKEN PATHS Handle rare events by returning to lower tier, reprofiling, and then recompiling. Rare Events Interpreter Tier1 Code Tier2 Code %ret = call i32 @llvm.experimental.deoptimize() ["deopt"(..)] ret i32 %ret 21

  22. PREDICATED DEVIRTUALIZATION switch (type(o)) case A: A::foo(); case B: B::foo(); default: @deoptimize() [ deopt ( )] switch (type(o)) case A: A::foo(); case B: B::foo(); default: o->foo(); Critical for Java, where everything is virtual by default 22

  23. IMPLICIT NULL CHECKS %is.null = icmp eq i8* %p, null br i1 %is.null, label %handler, label %fallthrough, !make.implicit !{} test rax, rax jz <handler> rsi = ld [rax+8] rsi = ld [rax+8] fault_pc __llvm_faultmaps[fault_pc] -> handler handler: call @__llvm_deoptimize 23

  24. LOCAL CODE LAYOUT branch_weights for code layout Hot cold hot cold hot cold hot hot hot cold cold cold Sources of slow paths: GC barriers, safepoints handlers for builtin exceptions result of code versioning 50%+ of total code size 24

  25. GLOBAL CODE LAYOUT Best to put cold code into it's own section func1-hot func2-hot func3-hot func1-cold func2-cold func3-cold func1-hot func1-cold func2-hot func2-cold func3-hot func3-cold See: LLVM back end for HHVM/PHP, LLVM Dev Conf 2015 25

  26. EXPLOITING SEMANTICS LLVM supports a huge space of optional annotations Performance Analysis Both metadata and attributes ~6-12 month effort Add Annotation Root Cause Issue Fix Uncovered Miscompiles 26

  27. DEFINING A CUSTOM PASS ORDER legacy::PassManager PM; PM.add(createEarlyCSEPass()); ... PM.run(Module) Requires careful thought and experimentation. MCJIT's OptLevel is not what you want. PassManagerBuilder is tuned for C/C++! May expose some pass ordering specific bugs 27

  28. EXPECT TO BECOME AN LLVM DEVELOPER You will uncover bugs, both performance and correctness You will need to fix them. Will need a downstream process for incorporating and shipping fixes. See backup slides for more specifics on this topic. 28

  29. STATUS CHECK We've got a reasonably good compiler for a c-like subset of our source language. We're packaging a modified LLVM library. This is further than most projects get. 29

  30. QUICK EXAMPLE public int intsMin() { int min = Integer.MAX_VALUE; for (int i = 0; i < a_I.length; i++) { min = min > a_I[i] ? a_I[i] : min; } return min; } 4x faster than competition on Intel Skylake 30

  31. Objection 4 of 6 "LLVM GENERATES TOO MUCH CODE" 31

  32. CALLING ALL LLVM DEVELOPERS... We regularly see 3-5x larger code compared to C2. Partly as a result of aggressive code versioning. But also, failure to exploit: mix of hot and cold code, need selective Oz lots and lots of no-return paths gc.statepoint lowering vs reg-alloc code versioning via deopt (unswitch, unroll, vectorizer) 32

  33. Objection 5 of 6 LLVM IS A SLOW JIT 33

  34. Absolutely. LLVM is not suitable for a first tier JIT. That's not what we have or need. We use the term "in memory compiler" to avoid confusion. 34

  35. SYSTEM INTEGRATION VM infrastructure moderates impact The tier 2 compiler sees a small fraction of the code. Background compilation on multiple compiler threads. Prioritized queueing of compilation work Hotness driven fill-in of callee methods This stuff is standard (for JVMs). Nothing new here. 35

  36. IN PRACTICE, MOSTLY IGNORE-ABLE Typical compile times around 100ms. Extreme cases in the seconds to low minute range. At the extreme, that's (serious) wasted CPU time, but nothing else. 36

  37. WAIT, YOU DO WHAT? There are cases where this isn't good enough. a popular big-data analytics framework spawns a JVM per query multi-tenancy environments can spawn 1000s of JVMs at once Improving compile time is hard. So what do we do? 37

  38. CACHED COMPILES Reuse compiled code across runs. Profiling continues to apply. After all, our "in memory compiler" does produce normal object files. No decrease in peak performance. Planned to ship in a future version of Zing. Credit for inspiration goes to the Pyston project and their article "Caching object code". https://blog.pyston.org/2015/07/14/caching-object-code 38

  39. Objection 6 of 6 MY LANGUAGE HAS FEATURE X (WHICH REQUIRES A CUSTOM COMPILER) 39

  40. LANGUAGE SPECIFIC DEFICIENCIES LLVM has been tuned for certain languages The more different your language, the more work needed. For Java, our key performance problems were: range checks null checks devirtualization & inlining type based optimizations deoptimization 40

  41. DO YOU ACTUALLY HAVE A PROBLEM? Try the naive version, seriously try it! Very useful to try different C++ implementations requires good knowledge of Clang may find a viable implementation strategy may provide insight into what optimizer finds hard Check to see if existing metadata/attributes are relevant 41

  42. CUSTOM ATTRIBUTES/METADATA Is there a single key missing fact? Or a small set thereof? A lot of advantages: Factoring for long term branching May be upstreamable Examples from our tree: LICM isGuaranteedToExecute "allocation-site" "deopt-on-throw" "known-value" "java-type"= XYZ Easy to functionally test Serialization 42

  43. METADATA HEALING Optimizer tends to strip metadata it doesn't understand. Don't try to change this Need a mechanism to "heal" previously stripped metadata See backup slides for more on this topic 43

  44. CALLBACKS Narrow interface between optimizer and containing runtime Optional<FieldInfo> getKnownValueKeyDependent(llvm::LLVMContext &C, KnownValueID BaseObj, int64_t Offset, uint64_t Size); 44

  45. CALLBACKS Primarily used for metadata healing But also useful for type system specific optimizations Resist temptation to add them freely. Each one should be strongly and continuously justified. We have 24 total. And that's too many. Optimizer can visit dead code (Critically Important) Fuzz this interface. Seriously, fuzz it! 45

  46. IMPLICATIONS FOR REPLAY Replay is an absolutely essential debugging and diagnostic tool $ opt O3 -S input.ll To preserve replay w/callbacks, need a query-log mechanism. $ opt O3 -S input.ll --read-query-log-from-file=input.ll.queries 46

  47. EMBEDDED HIGH LEVEL IR "Abstraction" functions Opaque by default to the optimizer Can be first class citizens with builtin knowledge Useful for building certain classes of high level optimizations declare i8 addrspace(1)* @azul.resolve_virtual(i8 addrspace(1)* %receiver, i64 %vtable_index) nounwind noinline readonly "late-inline"="2" "gc-leaf-function" 47

  48. EMBEDDED HIGH LEVEL IR Enable high level transforms such as: devirtualization object elision allocation sinking & partial escape analysis lock coarsening & elision Most of the above expressed as custom passes See backup slides for discussion of the tradeoffs around custom passes 48

  49. A (SLIGHT) CHANGE IN VIEW ON ABSTRACTIONS Useful for rapid evolution and prototyping But each is an optimization barrier until lowered As few as you can reasonably manage long term Lower early in the pipeline Minimize number of unique lowering points 49

  50. CONCLUSION For the right use case, LLVM can be used to build a production quality JIT which is performance competitive with state of the art. 50

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#