Understanding Floating Point Representation in Binary Systems
In computer systems, decimal numbers are represented in memory using scientific notation. This involves moving the decimal point and using mantissa and exponent to maintain precision and range. The transition to representing numbers in binary involves multiplying by 2 to the power instead of 10. Utilizing a structure with fixed point and floating point components aids in this representation. Four examples illustrate different scenarios of moving the decimal point with positive and negative numbers. Finally, an example demonstrates representing a binary floating point using a specified number of bits for mantissa and exponent.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Floating Point Representation Higher Computing Science
Introduction In computer systems, decimal numbers are represented in memory using scientific notation. This means that a number such as 53458.243 can be represented as 0.53458243 x 105 To represent in this way, we move the decimal point to the start of the number and then multiply by 10 to the power of places moved (which in this case is 5) 1234.56789 would become 0.123456789 x 104 3.424443 would become 0.3424443 x 101
Representing In Binary At Higher level, you need to know how to use this for numbers in binary At National 5 level you will have already learned the terms mantissa and exponent The mantissa is used to store the precision of a number any number that comes after the decimal point For example, the mantissa for the number 0.53458243 x 105would be 53458243 The exponent is used to store the range of a number the number used as the power For example, the exponent for the number 0.53458243 x 105would be 5
Representing in Binary The mantissa and exponent must be represented in binary This representation is known as Floating Point Denary numbers make use of 10 digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 Binary makes use of only two digits 0 and 1 Instead of multiplying by 10 to the power we will now be multiplying by 2 to the power
Structure To help with this, we will use the table shown below Fixed Point Floating Point Sign bit Mantissa Exponent The fixed point will contain the original number The floating point will show the number after being moved The sign bit will be one binary digit The mantissa will store the numbers after the decimal point The exponent will store the number used as the power
Examples We will look at four different examples 1. Using a positive number and moving the decimal point to the left 2. Using a positive number and moving the decimal point to the right 3. Using a negative number and moving the decimal point to the left 4. Using a negative number and moving the decimal point to the right
Example 1 How would 11011.0011 be represented in binary floating point representation using 16 bits for the mantissa (including the sign bit) and 8 bits for the exponent? To begin with, represent this number using floating point 0.110110011 x 25 As we are using binary, we cannot use the number 5 The number 5 converted into binary is: 128 64 32 16 8 4 2 1 0 0 0 0 0 1 0 1 Therefore, we can write this number as 0.110110011 x 2101
Example 1 (cont.) Next, we need to calculate the sign bit The sign bit indicates whether a number is positive or negative If it is positive then it is represented with a 0 If it is negative then it is represented with a 1 In this case, 11011.0011 is a positive number so the sign bit is 0
Example 1 (cont.) Next, we need to calculate the mantissa As we already know that this is the number after the decimal point in floating point representation (0.110110011 x 2101), the mantissa is 110110011 There are a total of 9 digits used here (known as bits) but the question states we must use 16 bits for the mantissa including the sign bit As we have already used a bit for the sign, we now have 15 bits We now need to add a 0 at the end of the mantissa until we use 15 bits This would give us 110110011000000 We have added 6 bits at the end of the mantissa to now give us 15 bits
Example 1 (cont.) Next, we need to calculate the exponent We already know that we are moving 5 (101, which uses 3 bits in total) places. As we are using 8 bits, we need to add 5 0s at the start of the exponent. This is 00000101. Fixed Point Floating Point Sign (1 bit) Mantissa (15 bit) Exponent (8 bit) 0.110110011 x 2101 11011.0011 0 110110011000000 00000101
Example 2 How would 0.0001101 be represented in binary floating point representation using 16 bits for the mantissa (including the sign bit) and 8 bits for the exponent? To begin with, represent this number using floating point 0.1101 x 2-3 Notice that we use -3 (this is because we are moving in the opposite direction) The number 3 converted into binary is: 128 64 32 16 8 4 2 1 0 0 0 0 0 0 1 1 Therefore, we can write this number as 0.1101 x 2-11
Example 2 (cont.) Next, we need to calculate the sign bit The sign bit indicates whether a number is positive or negative If it is positive then it is represented with a 0 If it is negative then it is represented with a 1 In this case, 0.0001101 is a positive number so the sign bit is 0
Example 2 (cont.) Next, we need to calculate the mantissa As we already know that this is the number after the decimal point in floating point representation (0.1101 x 2-11), the mantissa is 1101 There are a total of 4 bits used but the question states we must use 16 bits for the mantissa including the sign bit As we have already used a bit for the sign, we now have 15 bits We now need to add a 0 at the end of the mantissa until we use 15 bits This would give us 110100000000000 We have added 11 bits at the end of the mantissa to now give us 15 bits
Example 2 (cont.) Next, we need to calculate the exponent We already know that we are moving -3 places As we are using a negative number, this has to be represented using two s complement -128 64 32 16 8 4 2 1 1 1 1 1 1 1 0 1 This is 11111101 Fixed Point Floating Point Sign (1 bit) Mantissa (15 bit) Exponent (8 bit) 0.1101 x 2-11 0.0001101 0 110100000000000 11111101
Example 3 How would -111.00011 be represented in binary floating point representation using 16 bits for the mantissa (including the sign bit) and 8 bits for the exponent? To begin with, represent this number using floating point -0.11100011 x 23 As we are using binary, we cannot use the number 3 The number 3 converted into binary is: 128 64 32 16 8 4 2 1 0 0 0 0 0 0 1 1 Therefore, we can write this number as -0.11100011 x 211
Example 3 (cont.) Next, we need to calculate the sign bit The sign bit indicates whether a number is positive or negative If it is positive then it is represented with a 0 If it is negative then it is represented with a 1 In this case, -111.00011 is a negative number so the sign bit is 1
Example 3 (cont.) Next, we need to calculate the mantissa As we already know that this is the number after the decimal point in floating point representation (-0.11100011 x 211), the mantissa is 11100011 There are a total of 8 bits used but the question states we must use 16 bits for the mantissa including the sign bit As we have already used a bit for the sign, we now have 15 bits We now need to add a 0 at the end of the mantissa until we use 15 bits This would give us 111000110000000 We have added 7 bits at the end of the mantissa to now give us 15 bits
Example 3 (cont.) Next, we need to calculate the exponent We already know that we are moving 3 (11, which uses 2 bits in total) places. As we are using 8 bits, we need to add 6 0s at the start of the exponent. This is 00000011. Fixed Point Floating Point Sign (1 bit) Mantissa (15 bit) Exponent (8 bit) -0.11100011 x 211 -111.00011 1 111000110000000 00000011
Example 4 How would -0.000000101 be represented in binary floating point representation using 16 bits for the mantissa (including the sign bit) and 8 bits for the exponent? To begin with, represent this number using floating point -0.101 x 2-6 Notice that we use -6 (this is because we are moving in the opposite direction) The number 6 converted into binary is: 128 64 32 16 8 4 2 1 0 0 0 0 0 1 1 0 Therefore, we can write this number as -0.101 x 2-110
Example 4 (cont.) Next, we need to calculate the sign bit The sign bit indicates whether a number is positive or negative If it is positive then it is represented with a 0 If it is negative then it is represented with a 1 In this case, -0.000000101 is a negative number so the sign bit is 1
Example 4 (cont.) Next, we need to calculate the mantissa As we already know that this is the number after the decimal point in floating point representation (-0.101 x 2-6), the mantissa is 101 There are a total of 3 bits used but the question states we must use 16 bits for the mantissa including the sign bit As we have already used a bit for the sign, we now have 15 bits We now need to add a 0 at the end of the mantissa until we use 15 bits This would give us 101000000000000 We have added 12 bits at the end of the mantissa to now give us 15 bits
Example 4 (cont.) Next, we need to calculate the exponent We already know that we are moving -6 places As we are using a negative number, this has to be represented using two s complement -128 64 32 16 8 4 2 1 1 1 1 1 1 0 1 0 This is 11111010 Fixed Point Floating Point Sign (1 bit) Mantissa (15 bit) Exponent (8 bit) 0.101 x 2-110 -0.000000101 1 101000000000000 11111010