Understanding ASCII Characters and Bit Representation

cs252 fall 2024 l.w
1 / 30
Embed
Share

Explore the evolution of ASCII characters, from 7-bit ASCII to 8-bit ASCII, and the use of bits for text and numbers. Learn how characters and numbers are represented in bytes and delve into basic data formats. Discover the significance of control characters and printable characters in ASCII encoding.

  • ASCII
  • Characters
  • Bit Representation
  • Data Formats
  • Byte

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. cs252 fall 2024 symbols, characters, digitized data, numbers, images, video, 3d-models audio,

  2. We use bits for everything text numbers everything else Where everything is created with bits Since we are going to represent everything just using bits, we should spend some time getting comfortable with them, and recognizing some common patterns of usage.

  3. For starters though, lets look at two of the most common uses: A byte holds a character (a glyph you see on a keyboard) A byte holds a number (or a set of bytes hold a number)

  4. Bytes holding text (characters) Basic data formats In the beginning there was ASCII, and it was good 7bit ASCII one byte per character, top bit unused, or used for error checking The ASCII chart maps 32-127 onto all the symbols you see on the keyboard, the values 0-31 are used for unprintable 'control' characters (0-31)... like tab, cr, lf, and other characters meant for device control

  5. 7-bit ASCII characters 7 bits range from 0 to 127: 000 0000 - 0 000 0001 - 1 000 0010 - 2 000 0011 - 3 000 0100 - 4 . . . 111 1110 - 126 111 1111 - 127 control characters printable characters

  6. Then came 8 bit ascii 8 bits range from 0 to 255, but if the top bit is set, the range is 128-255: The extra 128 chars gained (by using the high bit as well) were used for umlauts, graphical symbols, proprietary chars, line drawing, double height/width chars... 1000 0000 - 128 1000 0001 - 129 1000 0010 - 130 1000 0011 - 131 1000 0100 - 132 . . . 1111 1110 - 254 1111 1111 - 255 ISO 8859 tried to get a handle on growing ASCII extensions

  7. there are several options for the set of extended ASCII characters from: 1000 0000 - 128 1111 1111 - 255 this table is one option

  8. here is a different option

  9. Current scene for representing text Computers spread and began being sold in countries with writing systems which had more than 256 symbols. At first it was 'deal with it', but soon competitive pressure required local extensions. Today there are multi-byte character, and variable number of bytes per character schemes. We ll discuss that more later, but even with those, ASCII still applies. Unless one is involved in internationalization efforts with a project, ASCII is most likely all you need for character representation.

  10. here are the bytes we use to represent everything there are 256 of them all possible patterns of 1 s and 0 s 0000 0000 0010 0000 0100 0000 0110 0000 1000 0000 1010 0000 1100 0000 1110 0000 0000 0001 0010 0001 0100 0001 0110 0001 1000 0001 1010 0001 1100 0001 1110 0001 0000 0010 0010 0010 0100 0010 0110 0010 1000 0010 1010 0010 1100 0010 1110 0010 0000 0011 0010 0011 0100 0011 0110 0011 1000 0011 1010 0011 1100 0011 1110 0011 0000 0100 0010 0100 0100 0100 0110 0100 1000 0100 1010 0100 1100 0100 1110 0100 0000 0101 0010 0101 0100 0101 0110 0101 1000 0101 1010 0101 1100 0101 1110 0101 0000 0110 0010 0110 0100 0110 0110 0110 1000 0110 1010 0110 1100 0110 1110 0110 0000 0111 0010 0111 0100 0111 0110 0111 1000 0111 1010 0111 1100 0111 1110 0111 0000 1000 0010 1000 0100 1000 0110 1000 1000 1000 1010 1000 1100 1000 1110 1000 0000 1001 0010 1001 0100 1001 0110 1001 1000 1001 1010 1001 1100 1001 1110 1001 0000 1010 0010 1010 0100 1010 0110 1010 1000 1010 1010 1010 1100 1010 1110 1010 0000 1011 0010 1011 0100 1011 0110 1011 1000 1011 1010 1011 1100 1011 1110 1011 0000 1100 0010 1100 0100 1100 0110 1100 1000 1100 1010 1100 1100 1100 1110 1100 0000 1101 0010 1101 0100 1101 0110 1101 1000 1101 1010 1101 1100 1101 1110 1101 0000 1110 0010 1110 0100 1110 0110 1110 1000 1110 1010 1110 1100 1110 1110 1110 0000 1111 0010 1111 0100 1111 0110 1111 1000 1111 1010 1111 1100 1111 1110 1111 0001 0000 0011 0000 0101 0000 0111 0000 1001 0000 1011 0000 1101 0000 1111 0000 0001 0001 0011 0001 0101 0001 0111 0001 1001 0001 1011 0001 1101 0001 1111 0001 0001 0010 0011 0010 0101 0010 0111 0010 1001 0010 1011 0010 1101 0010 1111 0010 0001 0011 0011 0011 0101 0011 0111 0011 1001 0011 1011 0011 1101 0011 1111 0011 0001 0100 0011 0100 0101 0100 0111 0100 1001 0100 1011 0100 1101 0100 1111 0100 0001 0101 0011 0101 0101 0101 0111 0101 1001 0101 1011 0101 1101 0101 1111 0101 0001 0110 0011 0110 0101 0110 0111 0110 1001 0110 1011 0110 1101 0110 1111 0110 0001 0111 0011 0111 0101 0111 0111 0111 1001 0111 1011 0111 1101 0111 1111 0111 0001 1000 0011 1000 0101 1000 0111 1000 1001 1000 1011 1000 1101 1000 1111 1000 0001 1001 0011 1001 0101 1001 0111 1001 1001 1001 1011 1001 1101 1001 1111 1001 0001 1010 0011 1010 0101 1010 0111 1010 1001 1010 1011 1010 1101 1010 1111 1010 0001 1011 0011 1011 0101 1011 0111 1011 1001 1011 1011 1011 1101 1011 1111 1011 0001 1100 0011 1100 0101 1100 0111 1100 1001 1100 1011 1100 1101 1100 1111 1100 0001 1101 0011 1101 0101 1101 0111 1101 1001 1101 1011 1101 1101 1101 1111 1101 0001 1110 0011 1110 0101 1110 0111 1110 1001 1110 1011 1110 1101 1110 1111 1110 0001 1111 0011 1111 0101 1111 0111 1111 1001 1111 1011 1111 1101 1111 1111 1111 control printable extended characters characters characters

  11. Super expensive computer, shared by many users (1981 vax 780, 1mip $1.3m) Cheap terminals

  12. Remind you of anything?

  13. each terminal is divided into a number of rows and columns. We can just resize modern software versions, old physical terminals were often 24x80, with some having the ability to shift into 132 col, mode using a smaller font.

  14. each terminal has a hot spot, the location where the next character will be written. the user sees a cursor at that spot. Bytes flow to the terminal: 0100 0001 This byte has 65 in it, hence A will be drawn at the cursor, and the cursor will move one spot to the right

  15. at first it worked strictly like a typewriter, left to right, top to bottom, scroll when you hit the bottom. The cursor/hot spot moves one box to the right after each character is printed, and moves to column zero, row + 1 when it gets to the end (<cr><lf>) This HID imitation constrained software quite a bit. editing used to be line-oriented. check out ed(1)

  16. Byte values < 32 are control characters when a terminal receives one, it won t draw anything instead it is meant to control the terminal/device Send this byte value to a terminal See this on the screen 0100 0001 (65) 0011 0110 (54) 0010 0100 (36) 0000 1010 (10) A 3 $ nothing, the cursor will return to column 0

  17. The escape character 0001 1011 (27) extends the special characters to more than 32 If you see a byte with a 27, an escape character, keep reading more characters until you recognize the sequence, then do what it says esc[2J esc[12;12H clear the screen jump the cursor to row 12, column 12

  18. The escape character 0001 1011 (27) extends the special characters to more than 32 If you see a byte with a 27, an escape character, keep reading more characters until you recognize the sequence, then do what it says esc[2J esc[12;12H clear the screen jump the cursor to row 12, column 12

  19. There are many, in several categories: https://www2.ccs.neu.edu/research/gpc/VonaUtils/vona/terminal/vtansi.htm Meanwhile the market was huge, with many different terminals available They did not all use the same escape sequences. There is software to help normalize and support device independence The vt100 was probably the most successful https://www2.ccs.neu.edu/research/gpc/VonaUtils/vona/terminal/vtansi.htm https://espterm.github.io/docs/VT100%20escape%20codes.html

  20. ASCII encoded vs Binary binary write out the bytes just as they are in the underlying storage ASCII encoded write one byte for each digit, use the ASCII value for that digit int x = 300; 0000 0000 0000 0000 //this is the underlying storage, i.e. what 0000 0001 //you will find in the 4 bytes of the int named x 0010 1100 // 0000 0000 0000 0000 0000 0001 0010 1100 - (that is 300) if you print out x in binary, you will output the four bytes: 0, 0, 1, 44 if you print out x ASCII encoded, you will output three bytes with values: 51, 48, 48 those are the ASCII values for the characters 3 and 0 HID s which display text, expect everything to be ASCII encoded when something is ASCII encoded, the number of bytes output depends on the value stored when this int is output in binary, it will always be 4 bytes, regardless of value

  21. ASII encoded vs Binary, see the directory named esc on the code page of the class website byte i; i=47; System.out.printf( %d , i); //this print statements results in //two bytes being sent to the //terminal 0x34, & 0x37 ASCII encoded i=47; System.out.printf( %c ,i) //this one sends 1 byte, 0x2F //you will see a forward slash / i=10; System.out.printf( %c ,i); //this one sends 1 byte, 0x0A //you will see a blank line (lf)

  22. Using escape sequences int main(argc, argv, envp) int argc; char **argv, **envp; { printf( ?[2J ); //I want it to print the escape control character, instead of a question mark //but when I press the escape key, vi exits input mode, it doesn t insert an // escape character (byte = 27 = 0001 1011) into the file printf( %c[2J , 27); //instead, we do this. When typing this in I did not need to type any control //characters. However, when it runs, it will output a control character - // escape in this case } to compile C code e.g. say the code above is in a file named x.c, do this: The nearly identical program in C note that you can t type a control character into the text file which holds the source code, hence the 27 & %c cc -o x x.c ?[2J 0i kh embedding control characters into ASCII encoded files is difficult few editors will support control characters other than carriage return, line feed, and horizontal tab this will create a new file named x, which you can then run i.e. You cannot press the escape key here

  23. The first windows systems used escape sequences to turn: and Windows was born into:

  24. Many programs, yesterday, today, and tomorrow use escape sequences for reasons from slightly smarter output scp progress: $ scp lecture2-control*.pptx root@68.183.19.99:/var/www/html/classes/itec252/lectures lecture2-controlCharacterEscapeSequences.pptx 100% 119KB 1.1MB/s 00:00 to graphics , data entry:

  25. Current scene for representing text Computers spread and began being sold in countries with writing systems which had more than 256 symbols. At first it was 'deal with it', but soon competitive pressure required local extensions

  26. Unicode Unicode could be roughly described as 'wide-body ASCII' that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose. Joseph Becker, 1988

  27. Under(over) estimating growth(collapse) rates is notoriously easy, and like the famous 'world-wide need for 5 computers', 16 bits did not cut it either. Also, a blind doubling of all text sizes, in memory, on disk, over the network, was not acceptable to the places where ASCII-works-just-fine-thank-you Character encoding and UTF - an ugly, distasteful, sad world of pain

  28. The horror of today From Java Character class documentation: A char value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. Got that?

  29. What else can bits encode, other than numbers and text? Everything

  30. Organizational Checkpoint - you should understand: how bytes are used to represent writing symbols 7-bit vs 8-bit ASCII the ASCII chart, and the 3 sections it contains: . control characters . printable characters . extended ASCII the existence of extended schemes (not the details) the difference between 0 and 0 all about the control character esc (27)

Related


More Related Content