Introduction to Floating Point Data Types and Operations

Slide Note
Embed
Share

This content delves into the fundamentals of floating-point data types, focusing on single-precision floating-point formats like float, excess-127, and their characteristics. It also compares float and int32_t data types, detailing the representation and conversion of values between them. The material covers the limitations and capabilities of processors, registers, and constant loading in handling floating-point operations. Additionally, it provides insights into floating-point constants and immediate constants in instruction encoding.


Uploaded on Aug 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Chapter 9 Getting Started with Floating Point Part 1: Introduction

  2. FLOATING-POINT DATA TYPE Single-precision floating-point (float) Excess-127 format 20 01111111 Normalized: 1.0 significand < 2.0 MS-Bit (integer bit) is always 1; never stored 31 30 23 22 0 Exponent fractional bits of significand 8 bits 23 bits Range: 2 127 10 38 Precision: 24 bits 7 decimal digits 0 if positive, 1 if negative The Cortex-M4 FPU doesn't support data type double.

  3. float versus int32_t 3 1 int32_t y = 1000 ; 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 float x = 1000. ; excess 127 exponent 3 1 3 0 2 3 2 2 significand 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13610 0.953125 value = + 1.953125 2(136 127) http://www.h-schmidt.net/FloatConverter/IEEE754.html

  4. Two Separate Processors constant constant VMOV Only a few values LDR (pseudo-instruction) VMOV Integer Processor FP Processor Registers: R0-R12 Registers: S0 S31 VMRS APSR_nzcv,FPSCR APSR Flags FPU Flags VSTR VSTMIA VSTMDB VLDR VLDMIA VLDMDB STR, STRD, STMIA STMDB LDR, LDRD, LDMIA LDMDB Floating-point data Integer data Main Memory

  5. FLOATING-POINT REGISTERS 32 bits S0 S1 S14 S15 S16 S17 D0 ... S0-S15 may be modified by functions D0-D15 may be used to hold 64-bit values, but all floating-point arithmetic is 32-bits. D7 D8 ... S16-S31 must be preserved by functions S30 S31 D15

  6. Chapter 9 Getting Started with Floating Point Part 2: Floating-Point Constants

  7. LOADING CONSTANTS Floating Point Floating Point Integer MOV R0,5 VMOV S0,0.5 VMOV S0,3.14159 VMOV only works with a limited set of constants. Floating Point Integer Floating Point LDR R0,=pi VLDR S0,[R0] pi: .float 3.14159 LDR R0,=5 VLDR S0,=3.14159 VLDR cannot be used as a pseudo-instruction.

  8. VMOV Immediate Constants The encoding of the VMOV instruction only provides 8 bits for a floating-point constant: 7 6 4 3 0 0 ? 7 16 ? 31 sign ? 2? VMOV constants are limited to

  9. VMOV Immediate Constants 0.125 0.1328125 0.140625 0.1484375 0.15625 0.1640625 0.171875 0.1796875 0.1875 0.1953125 0.203125 0.2109375 0.21875 0.2265625 0.234375 0.2421875 0.25 0.265625 0.28125 0.296875 0.3125 0.328125 0.34375 0.359375 0.375 0.390625 0.40625 0.421875 0.4375 0.453125 0.46875 0.484375 0.5 0.53125 0.5625 0.59375 0.625 0.65625 0.6875 0.71875 0.75 0.78125 0.8125 0.84375 0.875 0.90625 0.9375 0.96875 1.0 1.0625 1.125 1.1875 1.25 1.3125 1.375 1.4375 1.5 1.5625 1.625 1.6875 1.75 1.8125 1.875 1.9375 2.0 2.125 2.25 2.375 2.5 2.625 2.75 2.875 3.0 3.125 3.25 3.375 3.5 3.625 3.75 3.875 4.0 4.25 4.5 4.75 5.0 5.25 5.5 5.75 6.0 6.25 6.5 6.75 7.0 7.25 7.5 7.75 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0

  10. Common VMOV Immediates There are a total of 128 values that can be used as VMOV immediates. The magnitudes of the most commonly used values are easily remembered as: The first 31 multiples of 1: 1.0, 2.0, 3.0, 4.0, The first 32 multiples of 1 The first 32 multiples of 1 The first 32 multiples of 1 31.0 2: 0.5, 1.0, 1.5, 2.0, 16.0 4: 0.25, 0.5, 0.75, 1.0, 8.0 8: 0.125, 0.25, 0.375, 4.0 Use this instead: VSUB.F32 S0,S0,S0 Not supported: VMOV S0,0.0

  11. Checking a VMOV Constant To see if VMOV can use a particular value, double it up to 7 times. If the original value or any of the products is an integer between 16 and 31, then the floating- point value may be used as a VMOV immediate. Is x a VMOV constant? ? ? for I = 1 to 7 repeat: { if (????? ? = ? ??? 16 ? 31) Yes ? 2? } No

  12. Using Expressions to Create Constants LDR ADD R0,=(3*4) << 2 R0,R0,5+7 // this works // this works But you can't use expressions for floating- point constants: VMOV VMOV LDR VLDR .float .float S0,0.5/2.0 S0,0.25 // syntax error!! // this works factor: factor: R0,=factor S0,[R0] // S0 = (4.0/3.0)*3.14159 (4.0/3.0)*3.14159 // syntax error!! 4.18879 // (4.0/3.0)*3.14159

  13. Chapter 9 Getting Started with Floating Point Part 3: Moving Data

  14. REGISTER TO REGISTER Instruction Move FP Register to FP Register Move FP Register to Core Register Move Core Register to FP Register Move 2 FP Registers to Core Registers Move 2 Core Registers to FP Registers Syntax Operation Sd Sm VMOV Sd,Sm Rd Sm VMOV Rd,Sm Sd Rm VMOV Sd,Rm Rt Sm ; Rt2 Sm1 Note: m1 = m + 1 Sm Rt ; Sm1 Rt2 Note: m1 = m + 1 VMOV Rt,Rt2,Sm,Sm1 VMOV Sm,Sm1,Rt,Rt2 These only copy bits. They do NOT convert between integer and floating-point representations.

  15. CONVERTING BETWEEN INTEGER AND FLOATING POINT Instruction Convert Unsigned Integer to Floating-Point (float uint32_t) Convert 2 s complement Integer to Floating-Point (float int32_t) Convert Floating-Point to Unsigned Integer (uint32_t float) Convert Floating-Point to 2 s complement Integer (int32_t float) Syntax Operation Sd (float) Sm, where Sm is an unsigned integer VCVT.F32.U32 Sd,Sm Sd (float) Sm, where Sm is a 2 s complement integer VCVT.F32.S32 Sd,Sm Sd (uint32_t) Smtruncated VCVT.U32.F32 Sd,Sm Sd (uint32_t) Smrounded VCVTR.U32.F32 Sd,Sm Sd (int32_t) Sm truncated VCVT.S32.F32 Sd,Sm Sd (int32_t) Sm rounded VCVTR.S32.F32 Sd,Sm

  16. INSTRUCTION SUFFIXES Used with VCVT when converting between floating-point and integer values to specify the data type of the source and destination: .F32 32-bit single-precision float .S32 32-bit signed (2 s comp) integer .U32 32-bit unsigned integer NOTE: The two VCVT operands must both be floating-point registers (S0-S31).

  17. ROUNDING MODES FPSCR bits 23..22 Rounding Mode Round to nearest even (default) IEEE Abbrev. Examples -1.5 -2.5 +1.5 +2.5 ToNEAR 00 -2 -2 +2 +2 Round towards positive infinity ToPOSV 01 -2 -1 +2 +3 Round towards negative infinity ToNEGV 10 -3 -2 +1 +2 Round towards zero (truncate) ToZERO 11 -2 -1 +1 +2 Default: VCVTR uses ToNEAR VCVT uses ToZERO

  18. LOADING FLOATING-POINT DATA Memory to (Single or Double) Register Instruction Operation Syntax Sd mem32[Rn] VLDR Sd,[Rn] Load single- precision FPU Register from Memory Sd mem32[Rn + constant] VLDR Sd,[Rn,constant] Sd mem32[adrs of label] VLDR Sd,label Dd mem64[Rn] VLDR Dd,[Rn] Load double- precision FPU Register from Memory Dd mem64[Rn + constant] VLDR Dd,[Rn,constant] Dd mem64[adrs of label] VLDR Dd,label PC-relative can only be used to reference constants stored in the read-only code space (near the instruction).

  19. LOADING FLOATING-POINT DATA Memory to Multiple Registers Instruction Syntax Operation FP registers memory, 1st address in Rn; Updates Rn only if write-back flag (!) is appended to Rn. FP registers memory, addresses end just before address in Rn; Must append (!) and always updates Rn Load Multiple FPU Registers, Increment After VLDMIA Rn!,register list Load Multiple FPU Registers, Decrement Before VLDMDB Rn!,register list // Copy starting at mem[R0] // Copy ending before mem[R0] VLDMIA R0!,{S0,S1,S2} VLDMDB R0!,{S0-S5}

  20. STORING FLOATING-POINT DATA 1, 2 or Multiple Registers to Memory Instruction Store single-precision FPU Register to Memory Syntax VSTR Operation Sd mem32[Rn] Sd mem32[Rn + constant] Dd mem64[Rn] Dd mem64[Rn + constant] FP registers memory, 1st address in Rn; Updates Rn only if write-back flag (!) is appended to Rn. FP registers memory, addresses end just before address in Rn; Must append (!) and always updates Rn Sd,[Rn] Sd,[Rn,constant] Dd,[Rn] Dd,[Rn,constant] VSTR Store double- precision FPU Register to Memory VSTR VSTR Store Multiple FPU Registers, Increment After VSTMIA Rn!,register list Store Multiple FPU Registers, Decrement Before VSTMDB Rn!,register list

  21. Chapter 9 Getting Started with Floating Point Part 4: Parameters & Return Values

  22. FUNCTION PARAMETERS AND RETURN VALUES Parameters: Data of type float is passed in S0-S15 Left-most float in S0, next float in S1, etc. Pointer to float is passed in R0-R3 Return Value: Result of type float is returned in S0 Result of type float * is returned in R0

  23. EXAMPLE float foo(int, float, float *, float) ; float f1, f2 ; f2 = foo(5, 0.25, &f1, 1.0) ; LDR VMOV S0,0.25 LDR R1,=f1 VMOV S1,1.0 BL foo LDR R0,=f2 VSTR S0,[R0] R0,=5 // R0 = integer constant 5 // S0 = fl-pt constant 0.25 // R1 = address of f1 // S1 = fl-pt constant 1.0 // call function foo // R0 = address of f2 // store S0 in f2

  24. FLOATING-POINT PUSH & POP Instruction Syntax Push FPU Registers Operation SP SP 4 #registers, copy registers to mem[SP] VPUSH register list Copy mem[SP] to registers SP SP + 4 #registers POP FPU Registers VPOP register list IMPORTANT: VPUSH/VPOP only uses floating-point registers (S0-S31) PUSH/POP only uses integer registers (R0-R15)

  25. Preserving Floating-Point Registers foo: // S0 = f PUSH VMOV BL VMOV VADD.F32 S0,S0,S1 POP float foo(float f) { float bar(void) ; return bar() + f ; } {R4,LR} R4,S0 bar S1,R4 {R4,PC} Faster to preserve FPU registers in the core registers. foo: // S0 = f PUSH VPUSH VMOV BL VADD.F32 S0,S0,S16 // S16 = parameter f VPOP {S16} POP {PC} {LR} {S16} S16,S0 bar // preserve LR // preserve S16 // keep parameter f in S16 // may modify S0-S15 (and R0-R3, R12) // restore S16 // return (uses stack copy of LR)

  26. Chapter 9 Getting Started with Floating Point Part 5: Floating-Point Arithmetic

  27. INSTRUCTION SUFFIXES All floating-point arithmetic instructions (add, subtract, etc.) require a suffix to specify the operand format: .F16 16-bit half-precision float .F32 32-bit single-precision float .F64 64-bit double-precision float The only format supported by the STM32F429

  28. ARITHMETIC WITH REAL NUMBERS Instruction Syntax VADD.F32 Operation Sd Sn + Sm Sd Sn Sm Sd Sm Sd | Sm | (clears FPU sign bit, N) Sd Sn Sm Sd Sn Sm Sd Sm Sd,Sn,Sm Sd,Sn,Sm Sd,Sm Sd,Sm Sd,Sn,Sm Sd,Sn,Sm Sd,Sm Floating-point add VSUB.F32 Floating-point subtract VNEG.F32 Floating-point negate VABS.F32 Floating-point abs value VMUL.F32 Floating-point multiply VDIV.F32 Floating-point divide VSQRT.F32 Floating-point square root Floating-point Multiply and Add Sd Sd + Sn Sm VMLA.F32 Sd,Sn,Sm Floating-point Multiply and Subtract Sd Sd Sn Sm VMLS.F32 Sd,Sn,Sm In general, the floating-point arithmetic instructions do not set the flags.

  29. COMPARING REAL NUMBERS Instruction Floating-Point Compare two Registers Syntax Operation Computes Sd - Sm and updates FPU flags in FPSCR VCMP.F32 Sd,Sm Floating-Point Compare Register to Zero Computes Sd - 0 and updates FPU flags in FPSCR VCMP.F32 Sd,0.0 Move Flags from FPU FPSCR to core APSR Core CPU Flags FPU Flags VMRS APSR_nzcv,FPSCR VMRS is required in order to test the value of a floating-point flag! VCMP.F32 S0,0.0 VMRS APSR_nzcv,FPSCR BEQ IsZero

  30. INTERPRETING FLAGS AFTER VCMP Condition Code EQ (Equal) NE (Not Equal) HS (Higher or Same) or CS (Carry Set) LO (Lower) or CC (Carry Clear) HI (Higher) LS (Lower or Same) GE (Greater Than or Equal) LT (Less Than) GT (Greater Than) LE (Less Than or Equal) MI (Minus) PL (Plus) VS (Overflow Set) VC (Overflow Clear) AL (Always) VCMP Meaning == != or unordered or unordered < > or unordered < or unordered > or unordered < or unordered unordered not unordered unconditional unordered : One or both operands is a NaN (Not a Number), such as: the result of division by zero, or the square root of a negative number. Good News: Not normally an issue.

  31. Floating-Point Compare & Flags int32_t ImaginaryRoots(float a, float b, float c) { return Discriminant(a, b, c) < 0.0 ; // returns 0 or 1 } After a VCMP, you have to copy the FPU flags before you can test them. ImaginaryRoots: // S0=a, S1=b, S2=c PUSH BL VCMP.F32 S0,0.0 VMRS ITE MOVLT MOVGE POP {R3,LR} Discriminant // S0 = b*b 4.0*a*c // S0 < 0.0 ? APSR_nzcv,FPSCR // Core Flags <-- FPU Flags LT R0,1 // Discriminant < 0: return 1 R0,0 // Discriminant >= 0: return 0 {R3,PC}

  32. FPU Instructions in IT Blocks float LimitedIncrement(float a, float b) { if (a < b) a += 1.0 ; return a ; } LimitedIncrement: // S0 = a, S1 = b VCMP.F32 VMRS ITT VMOVLT VADDLT.F32 S0,S0,S1 BX S0,S1 APSR_nzcv,FPSCR // Core Flags FPU Flags LT S1,1.0 // S1 = 1.0 // S0 = a + 1.0 LR // a < b ? FPU instructions within IT blocks: Append condition before other modifiers.

  33. Floating-Point Equality Test int32_t CloseEnough(float x, float y, float threshold) { return fabsf(x y) < threshold ; } An equality test with FP values is likely to fail. Use a proximity test. CloseEnough: // S0 = x, S1 = y, S2 = threshold VSUB.F32 VABS.F32 VCMP.F32 VMRS ITE MOVLT MOVGE BX S0,S0,S1 S0,S0 S0,S2 APSR_nzcv,FPSCR // Core Flags FPU Flags LT R0,1 // Return 1 if LT R0,0 // Return 0 if GE LR // S0 = x - y // S0 = | x y | // | x y | < threshold

  34. Chapter 9 Getting Started with Floating Point Part 6: Performance

  35. FPU Instruction Cycle Counts Clock Cycles 1 3 14 1+N 1 1 2 Instructions Notes VADD, VSUB, VNEG, VMUL, VABS, VCVT VMLA, VMLS VDIV, VSQRT VLDR, VSTR, VPUSH, VPOP, VLDMIA, VLDMDB VMRS, VMSR, VCMP VMOV (register constant or register) VMOV (register pair register pair) Notes: 1. Add 1 if the result is used by the next instruction. 2. Execution may overlap the execution of any integer instructions that immediately follow. 3. N is the number of 32-bit registers. 1 1, 2 3

  36. Replacing VDIV by VMUL float VolumeOfCone(float radius, float height) { return AreaOfCircle(radius) * height / 3.0 ; } VolumeOfCone: // S0 = radius, S1 = height PUSH {R4,LR} VMOV R4,S1 BL AreaOfCircle // S0 = area of base of cone VMOV S1,R4 // S1 = height VMUL.F32 S0,S0,S1 // S0 = height * (area of base) VMOV S1,3.0 // S1 = 3.0 VDIV.F32 S0,S0,S1 // S0 = (height * (area of base))/3.0 VLDR S1,third // Multiplication by 1/3 is VMUL.F32 S0,S0,S1 // faster than division by 3 POP {R4,PC} // Restore R4 and return // Preserve R4 and LR // R4 = height 16 cyc 4 cyc third: .float 0.333333

  37. Simpler way to evaluate float < 0 int32_t ImaginaryRoots(float a, float b, float c) { return Discriminant(a, b, c) < 0.0 ; // returns 0 or 1 } ImaginaryRoots: // S0=a, S1=b, S2=c PUSH BL VCMP.F32 S0,0.0 VMRS ITE MOVLT MOVGE VMOV LSR POP {R3,LR} Discriminant // S0 = b*b 4.0*a*c // S0 < 0.0 ? APSR_nzcv,FPSCR // Core Flags <-- FPU Flags LT R0,1 // Discriminant < 0: return 1 R0,0 // Discriminant >= 0: return 0 R0,S0 // Floating-point sign bit is R0,R0,31 // in same position as integer {R3,PC} 5 cyc 2 cyc

  38. FPU Instruction Timing VADD.F32 S0,S0,S1 Fetch Decode Execute Execute VMUL.F32 S1,S1,S2 Fetch Decode Execute Execute Execute Execute VSUB.F32 S2,S2,S3 Fetch Decode Time (Clock Cycles) Many floating-point instructions are specified as taking 1 clock cycle. Even though they actually take 4, pipelining allows them to complete at a rate of 1 instruction per clock. However . . . .

  39. FPU Instruction Stalls Result of VADD not available until here VADD.F32 S0,S1,S2 Fetch Decode Execute Execute Fetch Decode Stall Execute Execute VMUL.F32 S1,S1,S0 Time (Clock Cycles) When the result of one floating-point instruction is an input operand to the next floating-point instruction, the second instruction stalls for 1 clock while waiting for the result.

  40. Overlapping FPU and Integer Ops C code ARM Assembly f: VLDR VMOV VDIV.F32S2,S0,S1 LDR SMULL ADD ASR SUB VSTR The integer and floating- point parts of the CPU operate independently so that integer instructions can be executing while waiting for an VDIV or VSQRT instruction to complete. S0,[R1] f: VLDR VMOV VDIV.F32 S0,S0,S1 VSTR LDR SMULL ADD ASR SUB BX BX S0,[R1] S1,7.0` S1,7.0` int f(int n, float *f) { S0,[R1] R3,=153391683 R2,R3,R3,R0 R3,R3,R0 R0,R0,2 R0,R3,R0,ASR 31 LR LR R3,=153391683 R2,R3,R3,R0 R3,R3,R0 R0,R0,2 R0,R3,R0,ASR 31 S2,[R1] *f = *f / 7.0 ; return n / 7 ; } VDIV F D E E E E E E E E E E E E E LDR F D E F D S E SMULL SMULL must wait on R3 from LDR F D S E ADD ADD must wait for R3 from SMULL F D E ASR F D E SUB VSTR must wait for S0 from VDIV F D S S S S E S S VSTR

  41. Chapter 9 Getting Started with Floating Point Part 7: Summary

  42. Things to Remember 1. Function parameters: Floats in S0-S15 Integers and pointers in R0-R3 (including pointer to a float!) 2. Function return value: Float in S0 64-bit integer in R1.R0 Anything else in R0 (including pointer to a float!) 3. The only addressing modes for VLDR and VSTR are [Rn] and [Rn,constant].

  43. Things to Remember 4. VPUSH and VPOP only work with FPU registers. Preserve FPU registers by copying S0-S15 into R4-R8 and PUSH/POP'ing R4-R8. Foo: // S0 = parameter PUSH VPUSH VMOV BL // use parameter(now in S16) VPOP POP Foo: // S0 = parameter PUSH VMOV BL VMOV // use parameter(now in S0) POP {LR} {S16} S16,S0 bar {R4,LR} R4,S0 bar S0,R4 {S16} {PC} {R4,PC}

  44. Things to Remember 5. VLDR allows "VLDR Sn,label", but VSTR does not. Load the destination address into Rn and then use VSTR Sn,[Rn]. 6. There is no VLDR pseudo-instruction, so you can't write "VLDR Sn,=3.14159". Use .float to create a constant in memory, add a label to it, and use VLDR Sn,label to load it. 7. VMOV supports a very restricted set on immediate constants. Easiest to only use it to load small integers (like 4.0) and some simple fractions (like 0.5).

  45. Things to Remember 8. VMOV can copy an integer register to a FPU register and vice- versa, but does NOT convert the representation. That requires a combination of VMOV and VCVT. 9. All instructions that perform arithmetic, data type conversion, or compares must specify the operand type, as in VADD.F32. VCVT requires two specifiers. 10.Comparing two FPU values requires VCMP followed by VMRS APSR_nzcv,FPSCR before the conditional branch or IT block.

  46. Things to Remember 11. In an IT block, append the condition code to an FPU instruction BEFORE appending the data type specifier, as in VADDLE.F32 12. Unlike the integer instructions, constants can't be written as expressions in which all the operands are constants. This also applies to .float directives. 13. VDIV and VSQRT are SLOW! Replace VDIV by a constant by VMUL by 1/constant Overlap execution of VDIV and VSQRT with that of integer instructions.

Related


More Related Content