Developing Pure Julia Codes for Elementary Functions in MIT's 18.337 Course

Slide Note

Mustafa Mohamad from MIT's Department of Mechanical Engineering teaches the Fall 2016 course on developing pure Julia codes for elementary functions. The focus is on functions like Trigonometric (sin, cos, tan, etc.), Hyperbolic (sinh, cosh, tanh, etc.), Exponential and Logarithmic, and Floating-Point Manipulation functions. The grand goal is specifically to reduce dependency on openlibm. Julia promises efficient code with features like multiple dispatch and macros. The course includes a Julia port of the SLEEF C library for improved performance, despite some drawbacks related to accuracy and speed. Benchmarks are discussed, highlighting performance differences between SLEEF and BASE libraries.

park_de Follow

Uploaded on Nov 27, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Elementary Functions in Julia Mustafa Mohamad Department of Mechanical Engineering, MIT 18.337, Fall 2016

Problem Develop pure Julia codes for elementary function Trigonometric functions sin, cos, tan, sincos asin, acos, atan, atan2 Hyperbolic functions sinh, cosh, tanh, asinh, acosh, atanh Exponential and logarithmic log, log2, log10, log1p, ilog2, exp, exp2, exp10, expm1, cbrt, pow Floating point manipulation ldexp, ilog2, frexp

Grand Goal Excise dependency on openlibm Basically a fork of freebsd's msun math library plus patches (originally developed by Sun microsystems) Accumulated patches have gathered over the years that have not been ported over to openlibm

The Julia Promise Multiple dispatch and macros, and other features make it possible to write generic versions of algorithms and still get maximal performance Generic versions often easier to understand Get efficient code for Float16 and Float128 Float16: forthcoming in Julia 0.6/1.0 Float128: forthcoming in Julia 0.6/1.0

Background: Julia port of SLEEF C library Takes advantage of SIMD and minimal code branches Look at code @code_llvm or @code_native Julia ported code benefits from auto SIMD vectorization See: https://github.com/JuliaMath/Sleef.jl

SLEEF good accuracy but too slow SLEEF good accuracy but too slow Avoiding branches necessitates unnecessary computations for some branches Double-double arithmetic used to squeeze extra bits of precision and avoid branching (library too slow on non-fma systems, ~6-10 slower then Base)

Benchmarks MEDIAN TIME RATIO SLEEF/BASE Performance critically depends on availability of fused-multiply-add instruction (FMA) sin/cos/tan have bugs so benchmark here is not fair, same for the log functions exp10 function is very fast compared to base exp10, but the exp10 function in openlibm is dumb; it just calls the pow function

Amal.jl https://github.com/JuliaMath/Amal.jl

General lessons Developing accurate and fast elementary functions is very hard and time consuming . strict accuracy constrains for high quality implementation require error s to be < 1 ulp Must take advantage of the IEEE form of Floating point numbers to extract maximal speed

Important considerations Ideally prefer polynomial approximants instead of rational approximants (division is much slower than or + ~can) Availability of FMA instruction (fused multiply and add with no intermediate rounding) Avoid algorithms that depend on table-lookups (throughout of CPU instructions is outpacing memory access speed, as a result these table- based methods have fallen out of flavor)

Testing Critical Possible to test all Float32 numbers O(109) in a couple of minutes Unfortunately it s not possible to test all Float64 numbers O(1019) Benchmarking: @code_llvm/@code_native, @code_warntype, BenchmarkTools.jl

Developing approximants 1. Range reduction on argument to the smallest set ? possible 2. Compute minimax approximation polynomial on ? 3. Scale back if necessary Remez based algorithm to compute minimax polynomial Minimizes norm For rational approximations use Pad approximant

Developed Public Functions exp , exp2, exp10 exp, exp2 have non-fma versions (msun based) exp10 (polynomial < 1ulp for both fma and nonfma) log Floating point manipulation functions frexp, ldexp, significand, exponent

Contributing to Bases math.jl

Pure Julia ldexp function computes significand x 2exponent without multiplication regular branch: ~ 1.5x speedup subnormal number and subnormal output: ~ 3.6x speedup normal number and subnormal output: ~ 2.6x speedup subnormal number and normal output: ~ 13x speedup Wins on readability and speed! Generic! Generic over both argument x AND e (may be Int128, BigInt, etc., without sacrificing Int64/Int32 perf)

exp function example https://github.com/JuliaMath/Amal.jl/blob/master/src/exp.jl https://github.com/JuliaMath/Amal.jl/blob/master/src/exp.jl Method observe: exp ? = exp ? ln 2 + ? = 2? exp(r) We have taken advantage of the float format! 1. Argument reduction: Reduce x to an r so that |r| <= 0.5*ln(2). Given x, find r and integer k such that x = k ln(2) + r, |r| 0.5 ln(2). 2. Approximate exp(r) by a polynomial on the interval [-0.5*ln(2), 0.5*ln(2)]: exp(x) = 1.0 + x + polynomial(x), Note the special representation! 3. Scale back: exp(x) = 2^k exp(r)

median speed relative to base: 0.95 (Amal/Base) Possible to make this a tad faster custom macro that converts numeric coefficients to the same type as ? with no overhead (dispatch)

exp for non-fma systems Pade approximation to ?? A Remez algorithm is used to generate a polynomial to approximate ? Therefore we can write (it s actually better to then carry out another long division step to minimize cancelation effects)

Thank you for listening!

Developing Pure Julia Codes for Elementary Functions in MIT's 18.337 Course

Download Presentation

Presentation Transcript

Related

More Related Content