Understanding Integer and Floating Point Number Representations

Slide Note
Embed
Share

Exploring the limitations and design decisions behind representing integers and floating point numbers in memory. Learn about unsigned and signed integers, two's complement, as well as key values and concepts to remember. Delve into the vision behind floating point numbers and their representation for real arithmetic operations.


Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Integers & Floating Point Numbers: Limits of Representation CSE 351 Autumn 2016 Section 3

  2. Key Points Remember that there are limitations! Memory is finite, numbers/data are not finite We can only represent so much We have ??distinct bit patterns with w bits Design Decisions Efficient/Fast and Easy to Implement Accuracy Range Precision

  3. Unsigned Integers Unsigned values follow base 2 system Example of converting from base 2 to base 10 b7b6b5b4b3b2b1b0= b727+ b626+ + b121+ b020 Benefit: Add and subtract using the normal carry and borrow rules, just in binary 63 00111111 +00001000 01000111 + 8 71

  4. Signed Integers: Twos Complement ?? ? has weight ?? ?, other bits have usual weights +?? bw-1 bw-2 . . . b0 UMax UMax 1 TMax + 1 Unsigned Range TMax TMax 2 s 0 0 1 2 Complement Range TMin

  5. Values To Remember! Unsigned Values UMin = 0 000 0 UMax = 2w 1 111 1 Two s Complement Values TMin = 2w 1 100 0 TMax = 2w 1 1 011 1 Negative one 111 1 0xF...F Values for W = 32 Decimal 4,294,967,296 2,147,483,647 -2,147,483,648 Hex Binary UMax TMax TMin 11111111 11111111 11111111 11111111 FF FF FF FF 01111111 11111111 11111111 11111111 7F FF FF FF 10000000 00000000 00000000 00000000 80 00 00 00 -1 0 -1 0 11111111 11111111 11111111 11111111 FF FF FF FF 00000000 00000000 00000000 00000000 00 00 00 00 LONG_MIN = -9223372036854775808 Values for W = 64 LONG_MAX = 9223372036854775807 5 ULONG_MAX = 18446744073709551615

  6. Floating Point Numbers: The Vision What do we want? Large range of values Large numbers and very small numbers Precise values Reflect real arithmetic Support values such as + , , Not A Number (NaN) Similar encoding to Two s Complement

  7. Floating Point Numbers V = ( 1)s * M * 2E s exp frac Numerical Form Sign bit s determines whether number is negative or positive Significand (mantissa) M normally a fractional value in range [1.0, 2.0) Exponent E weights value by a (possibly negative) power of two Representation in Memory MSB s is sign bit s exp field encodes E (but is not equal to E) remember the bias frac field encodes M (but is not equal to M)

  8. Floating Point Numbers Value: 1 Mantissa 2Exponent Bit Fields: ( 1)S 1.M 2(E+bias) Bias Read exponent as unsigned, but with bias of (2w 1 1) = 127 Representable exponents roughly positive and negative Exponent 0 (Exp = 0) is represented as E = 0b 0111 1111 Why? Floating point arithmetic = easier Somewhat compatible with 2 s complement

  9. Floating Point Numbers: Denormalized No leading 1 Remember! Implicit exponent is 126 (not 127) even though E = 0x00 Why? To represent really smaller numbers that are close to 0

  10. Floating Point Representation Summary Exponent Mantissa Meaning 0x00 0 0 0x00 Non-zero denorm num 0x01 0xFE Anything norm num 0xFF 0 0xFF Non-zero NaN

  11. Floating Point Limitations: Math Properties Exponent overflow yields + or - Floats with value + , - , and NaN can be used in operations Result usually still + , - , or NaN; sometimes intuitive, sometimes not Floating point ops do not work like real math, due to rounding! Not associative: (3.14 + 1e100) 1e100 != 3.14 + (1e100 1e100) Not distributive: 100 * (0.1 + 0.2) != 100 * 0.1 + 100 * 0.2 Not cumulative Repeatedly adding a very small number to a large one may do nothing

  12. Distribution of Values What can t we get? Between largest norm and infinity: Overflow Between zero and smallest denorm: Underflow Between norm numbers?: Rounding -15 -10 -5 0 5 10 15 Infinity Denormalized Normalized

  13. Problems Problems Problems! Consider the decimal number 1.25. Give the IEEE-754 representation of this number as a 32-bit floating-point number. Convert 1.1 x 2-128to IEEE 754 single precision

  14. Problems Problems Problems! If x and y have type float, give two different reasons that (x+2*y)-y == x+y might evaluate to 0 (i.e., false).

  15. Problems Problems Problems! What is the largest positive number we can represent with a 10-bit signed two s complement integer? Bit pattern? Decimal value?

  16. Problems Problems Problems! Assuming unsigned integers, what is the result when you compute UMAX+1? Assuming two s complement signed representation, what is the result when you compute TMAX+1?

  17. Problems Problems Problems! Is the == operator a good test of equality for floating point values? Why or why not?

  18. Problems Problems Problems! Give an example of three floating-point numbers x, y, and z, such that the distributive property x (y + z) = x y + x z does not hold.

  19. How to use GDB Download calculator.c calculator.c from class webpage For debugging, we need to compile the file with debugging symbols. This an be done using g g flag in GCC. gcc gcc - -Wall Wall - -std std=gnu99 =gnu99 - -g g calculator.c calculator.c - -o calculator o calculator To load binary into GDB, use following command: gdb gdb calculator calculator You should see bunch of information including version and license information To run binary in GDB, use run run command (type run run or just r r). This will start executing your program till any error occurs in your program. If you want to start stepping through main() main(), use start Passing command line arguments in GDB run calculator 3 4 + run calculator 3 4 + View source code while debugging View source code while debugging Use list list command. For example, if you want to look at the main function, type list main() If you want to list a content around line 45, then type list 45 If you want to display a range of line numbers such as lines 10-15, then use list 10,15 start command. list main()

  20. How to use GDB (continued) Setting Breakpoints break command creates break point (example: break main). Each break point is associated with a number. To enable/disable breakpoint, use enable or disable command. TO see summary of all breakpoints, use info command (example: info break) To continue execution after breakpoint, use continue or c command Stepping through source code in GDB To step one line of source code at a time, use next or n command. To step through functions, use step or s command. To step out of the function, use finish command. Printing values while debugging Use print command. Exiting GDB Press Ctrl-D or type quit or type q

Related


More Related Content