LING 388: Computers and Language

 
LING 388: Computers and
Language
 
Lecture 4
 
Administrivia
 
Quick Homework 3 graded
Homework 3 Review
What's underneath Python: a deep dive …
Homework 3 Review
 
Euler’s Identity:
 
Arbitrary precision:
>>> from decimal import *
>>> getcontext()
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[],
traps=[InvalidOperation, DivisionByZero, Overflow])
>>> import cmath
>>> i = cmath.sqrt(-1)
>>> pi = Decimal('3.141592653589793238462643383279’)
>>> pi
Decimal('3.141592653589793238462643383279’)
>> pi / 2
Decimal('1.570796326794896619231321692')
>>> i*pi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'complex' and 'decimal.Decimal'
 
The underlying architecture
Python is a high-level language
Beneath Python lies binary…
 
interesting use of binary notation:
6 bit Leica camera lens encoding
The 6-bit code shown on the right
 is 101011 (1 = black, 0 = white),
read starting from the philips
screw head
Introduction
 
Memory Representation
binary: zeros and ones (1 bit)
organized into bytes (8 bits)
memory is byte-addressable
word (32 bits)
e.g. integer
(64 bits: floating point number)
big-endian/little-endian
most significant byte first or least significant byte
communication …
addressable
Memory
(RAM)
 
0
 
FFFFFFFF
your Intel and
ARM CPUs
array
a[23]
 
Introduction
 
A typical notebook computer
Example
: a 2013 Macbook Air
CPU: Core i5-4250U
1.3 billion transistors
built-in GPU
TDP: 15W (1.3 GHz)
Dual core (Turbo: 2.6 GHz)
Hyper-Threaded (4 logical CPUs, 2
physical)
64 bit
64 KB (32 KB Instruction + 32 KB Data)
L1 cache
256 KB L2 cache per core
12MB L3 cache shared
16GB max RAM
Increased
address space
and 64-bit
registers
 
anandtech.com
 
A 4 core machine: 8 virtual
Introduction
 
Machine Language
A CPU understands only one language: machine language
all
 
other
 
languages
 must be translated into machine language
Primitive instructions include:
MOV
PUSH
POP
ADD / SUB
INC / DEC
IMUL / IDIV
AND / OR / XOR / NOT
NEG
SHL / SHR
JMP
CMP
JE / JNE / JZ / JG / JGE / JL / JLE
CALL / RET
Assembly Language: (this notation)
by definition, nothing built on it is more powerful
 
http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
Introduction
 
Not all the machine instructions are conceptually necessary
many provided for speed/efficiency
Theoretical Computer Science
All mechanical computation can be carried out using a 
TURING MACHINE
Finite state table + (infinite) tape
Tape instructions:
at the tape head: Erase, Write, Move (Left/Right/NoMove)
Finite state table:
Current state x Tape symbol --> new state x New Tape symbol x Move
Introduction
 
Storage:
based on digital logic
binary (base 2) – everything is a power of 2
Byte: 8 bits
01011011
= 2
6
+2
4
+2
3
+2
1
+2
0
= 64 + 16 + 8 + 2 + 1
= 91 (in decimal)
Hexadecimal (base 16)
0-9,A,B,C,D,E,F (need 4 bits)
5B (= 1 byte)
= 5*16
1
 + 11
= 80 + 11
= 91
5
B
Introduction: data types
 
Integers
In one byte (= 8 bits), what’s the largest and smallest
number, we can represent?
Answer
: -128 .. 0 .. 127
Why? -2
8-1
.. 0 .. 2
8-1
 – 1
00000000 = 0
01111111 = 127
10000000 = -128
11111111 = -1
 
2
7
 + 2
6 
+ 2
5
 + 2
4
 + 2
3
 + 2
2
 + 2
1
 + 2
0
 
2
8
 – 1 = 255
 
2
7
 – 1 = 127
Introduction: data types
Integers
In one byte (= 8 bits), what’s the largest and smallest
number, we can represent?
00000000 = 0
01111111 = 127
10000000 = -128
11111111 = -1
00000000
11111111
2’s complement representation
Introduction: data types
 
Integers
In one byte, what’s the largest and smallest number, we can
represent?
Answer
: -128 .. 0 .. 127 using the 
2’s complement
representation
Why? super-convenient for arithmetic operations
“to convert a positive integer X to its negative counterpart, flip
all the bits, and add 1”
Example
:
00001010 = 2
3
 + 2
1
 = 10 (decimal)
11110101 + 1 = 11110110 = -10 (decimal)
11110110 flip + 1 = 00001001 + 1 = 00001010
Addition
:
-10 + 10
= 11110110 
+ 00001010 = 0 (ignore overflow)  
Introduction: data types
 
Typically 32 bits (4 bytes) are used to store an integer
range: -2,147,483,648 (2
31-1
 -1) to 2,147,483,647 (2
32-1
 -1)
 
 
 
 
what if you want to store even larger numbers?
Binary Coded Decimal (BCD)
code each decimal digit separately, use a string (sequence) of decimal digits …
C:
int
Introduction: data types
 
what if you want to store even larger numbers?
Binary Coded Decimal (BCD)
1 byte can code two digits (0-9 requires 4 bits)
1 nibble (4 bits) codes the sign (+/-), e.g. hex C/D
0
1
9
 
2 bytes (= 4 nibbles)
 
2.5 bytes (= 5 nibbles)
C
D
 
credit (+)
 
debit (-)
Introduction: data types
 
Typically, 64 bits (8 bytes) are used to represent floating point
numbers (double precision)
c
 = 2.99792458 x 10
8 
(m/s)
coefficient: 52 bits (implied 1, therefore treat as 53)
exponent: 11 bits (not 2’s complement, unsigned with bias)
sign: 1 bit (+/-)
C:
float
double
 
wikipedia
x86 CPUs have a built-in 
floating point coprocessor (x87)
80 bit long registers
Introduction: data types
 
How about letters, punctuation, etc.?
ASCII
American Standard Code for Information Interchange
Based on English alphabet (upper and lower case) + space + digits +
punctuation + control (Teletype Model 33)
Question
: how many bits do we need?
7 bits + 1 bit parity
Remember everything is in binary …
C:
char
 
Teletype Model 33 ASR
Teleprinter (Wikipedia)
Introduction: data types
order is important in sorting!
0-9: there’s a connection with BCD. 
Notice: 
code 30 (hex) through 39 (hex) 
Introduction: data types
 
Parity bit:
transmission can be noisy
parity bit can be added to ASCII code
can spot single bit transmission errors
even/odd parity:
receiver understands each byte should be even/odd
Example:
0 (zero) is ASCII 30 (hex) = 011000
even parity: 011000
0
, odd parity: 011000
1
Checking parity:
Exclusive or (XOR): basic machine instruction
A xor B true if either A or B true but not both
Example:
(even parity 0) 011000
0 
xor bit by bit
0
 xor 
1
 = 1 xor 
1
 = 0 xor 
0
 = 0 xor 
0
 = 0 xor 
0
 = 0 xor 
0
 = 0 xor 
0
 = 0
x86 assemby language
:
1.
PF: even parity flag set by
      arithmetic ops.
2.
TEST: AND (don’t store
result), sets PF
3.
JP: jump if PF set
 
Example
:
MOV al,<char>
TEST al, al
JP <location if even>
<go here if odd>
Introduction: data types
 
UTF-8
standard in the post-ASCII world
backwards compatible with ASCII
(
previously, different languages had multi-byte character sets
that clashed
)
Universal Character Set (UCS) Transformation Format 8-bits
 
(Wikipedia)
Introduction: data types
 
Example:
あ Hiragana letter A: UTF-8: E38182
Byte 1: E = 
1110
, 3 = 0011
Byte 2: 8 = 
10
00, 1 = 0001
Byte 3: 8 = 
10
00, 2 = 0010
い Hiragana letter I: UTF-8: E38184
 
 
Shift-JIS (Hex): 
: 
82A0
: 
82A2
Introduction: data types
 
How can you tell what encoding your file is using?
Detecting UTF-8
Microsoft:
1
st
 three bytes in the file is EF BB BF
(
not all software understands this; not everybody uses it
)
HTML:
<
meta
 http-equiv="Content-Type" content="text/html;charset=UTF-8" >
(
not always present
)
Analyze the file:
Find non-valid UTF-8 sequences: if found, not UTF-8…
Interesting paper:
http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html
Introduction: data types
 
Text files:
text files have lines: 
how do we mark the end of a line
?
End of line (EOL) control character(s):
LF 
 
0x0A 
 
(Mac/Linux),
CR 
 
0x0D 
 
(Old Macs),
CR+LF  0x0D0A (Windows)
End of file (EOF) control character:
(EOT) 0x04 (
aka
 Control-D)
binaryvision.nl
programming languages:
NUL used to mark
the end of a string
Slide Note
Embed
Share

The lecture explores topics such as Python's underlying architecture, memory representation, increased address space, 64-bit registers, and machine language instructions. It delves into binary notation, memory organization, CPU architecture specifics, and the translation of programming languages into machine code. Dive into the complexities of computer systems and their language processing capabilities.

  • Computers
  • Language
  • Architecture
  • Machine Language
  • Binary Notation

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. LING 388: Computers and Language Lecture 4

  2. Administrivia Quick Homework 3 graded Homework 3 Review What's underneath Python: a deep dive

  3. Homework 3 Review Euler s Identity: Arbitrary precision: >>> from decimal import * >>> getcontext() Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow]) >>> import cmath >>> i = cmath.sqrt(-1) >>> pi = Decimal('3.141592653589793238462643383279 ) >>> pi Decimal('3.141592653589793238462643383279 ) >> pi / 2 Decimal('1.570796326794896619231321692') >>> i*pi Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for *: 'complex' and 'decimal.Decimal'

  4. The underlying architecture Python is a high-level language Beneath Python lies binary interesting use of binary notation: 6 bit Leica camera lens encoding The 6-bit code shown on the right is 101011 (1 = black, 0 = white), read starting from the philips screw head

  5. Introduction array a[23] 0 Memory Representation binary: zeros and ones (1 bit) organized into bytes (8 bits) memory is byte-addressable word (32 bits) e.g. integer (64 bits: floating point number) big-endian/little-endian most significant byte first or least significant byte communication addressable Memory (RAM) your Intel and ARM CPUs FFFFFFFF

  6. Introduction Increased address space and 64-bit registers A typical notebook computer Example: a 2013 Macbook Air CPU: Core i5-4250U 1.3 billion transistors built-in GPU TDP: 15W (1.3 GHz) Dual core (Turbo: 2.6 GHz) Hyper-Threaded (4 logical CPUs, 2 physical) 64 bit 64 KB (32 KB Instruction + 32 KB Data) L1 cache 256 KB L2 cache per core 12MB L3 cache shared 16GB max RAM anandtech.com A 4 core machine: 8 virtual

  7. Introduction Machine Language A CPU understands only one language: machine language allotherlanguages must be translated into machine language Primitive instructions include: MOV PUSH POP ADD / SUB INC / DEC IMUL / IDIV AND / OR / XOR / NOT NEG SHL / SHR JMP CMP JE / JNE / JZ / JG / JGE / JL / JLE CALL / RET Assembly Language: (this notation) by definition, nothing built on it is more powerful http://www.cs.virginia.edu/~evans/cs216/guides/x86.html

  8. Introduction Not all the machine instructions are conceptually necessary many provided for speed/efficiency Theoretical Computer Science All mechanical computation can be carried out using a TURING MACHINE Finite state table + (infinite) tape Tape instructions: at the tape head: Erase, Write, Move (Left/Right/NoMove) Finite state table: Current state x Tape symbol --> new state x New Tape symbol x Move

  9. Introduction Storage: based on digital logic binary (base 2) everything is a power of 2 Byte: 8 bits 01011011 = 26+24+23+21+20 = 64 + 16 + 8 + 2 + 1 = 91 (in decimal) Hexadecimal (base 16) 0-9,A,B,C,D,E,F (need 4 bits) 5B (= 1 byte) = 5*161 + 11 = 80 + 11 = 91 27 27 26 26 25 25 24 24 23 23 22 22 21 21 20 20 0 1 0 1 1 0 1 1 23 23 23 22 22 22 21 21 21 20 20 20 23 23 23 22 22 22 21 21 21 20 20 20 0 0 1 1 0 0 1 1 1 0 1 1 B 5 161160 161160 5 B

  10. Introduction: data types Integers In one byte (= 8 bits), what s the largest and smallest number, we can represent? Answer: -128 .. 0 .. 127 Why? -28-1.. 0 .. 28-1 1 00000000 = 0 01111111 = 127 10000000 = -128 11111111 = -1 27 27 27 26 26 26 25 25 25 24 24 24 23 23 23 22 22 22 21 21 21 20 20 20 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 27 + 26 + 25 + 24 + 23 + 22 + 21 + 20 28 1 = 255 27 27 26 26 25 25 24 24 23 23 22 22 21 21 20 20 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 27 1 = 127 27 26 25 24 23 22 21 20 1 0 0 0 0 0 0 0

  11. Introduction: data types Integers In one byte (= 8 bits), what s the largest and smallest number, we can represent? 00000000 = 0 01111111 = 127 10000000 = -128 11111111 = -1 00000000 11111111 0 127 -128 -127 -1 2 s complement representation

  12. Introduction: data types Integers In one byte, what s the largest and smallest number, we can represent? Answer: -128 .. 0 .. 127 using the 2 s complement representation Why? super-convenient for arithmetic operations to convert a positive integer X to its negative counterpart, flip all the bits, and add 1 Example: 00001010 = 23 + 21 = 10 (decimal) 11110101 + 1 = 11110110 = -10 (decimal) 11110110 flip + 1 = 00001001 + 1 = 00001010 Addition: -10 + 10 = 11110110 + 00001010 = 0 (ignore overflow)

  13. Introduction: data types Typically 32 bits (4 bytes) are used to store an integer range: -2,147,483,648 (231-1 -1) to 2,147,483,647 (232-1 -1) 231 230 229 228 227 226 225 224 27 26 25 24 23 22 21 20 byte 3 byte 2 byte 1 byte 0 C: int what if you want to store even larger numbers? Binary Coded Decimal (BCD) code each decimal digit separately, use a string (sequence) of decimal digits

  14. Introduction: data types what if you want to store even larger numbers? Binary Coded Decimal (BCD) 1 byte can code two digits (0-9 requires 4 bits) 1 nibble (4 bits) codes the sign (+/-), e.g. hex C/D 23 22 21 20 2 0 1 4 0 0 0 0 0 2 bytes (= 4 nibbles) 23 22 21 20 1 + 2 0 1 4 0 0 0 1 2.5 bytes (= 5 nibbles) 23 22 21 20 9 debit (-) credit (+) 1 0 0 1 23 22 21 20 23 22 21 20 C D 1 1 0 0 1 1 0 1

  15. Introduction: data types Typically, 64 bits (8 bytes) are used to represent floating point numbers (double precision) c = 2.99792458 x 108 (m/s) coefficient: 52 bits (implied 1, therefore treat as 53) exponent: 11 bits (not 2 s complement, unsigned with bias) sign: 1 bit (+/-) C: float double x86 CPUs have a built-in floating point coprocessor (x87) 80 bit long registers wikipedia

  16. Introduction: data types C: char How about letters, punctuation, etc.? ASCII American Standard Code for Information Interchange Based on English alphabet (upper and lower case) + space + digits + punctuation + control (Teletype Model 33) Question: how many bits do we need? 7 bits + 1 bit parity Remember everything is in binary Teletype Model 33 ASR Teleprinter (Wikipedia)

  17. Introduction: data types order is important in sorting! 0-9: there s a connection with BCD. Notice: code 30 (hex) through 39 (hex)

  18. Introduction: data types x86 assemby language: 1. PF: even parity flag set by arithmetic ops. 2. TEST: AND (don t store result), sets PF 3. JP: jump if PF set Parity bit: transmission can be noisy parity bit can be added to ASCII code can spot single bit transmission errors even/odd parity: receiver understands each byte should be even/odd Example: 0 (zero) is ASCII 30 (hex) = 011000 even parity: 0110000, odd parity: 0110001 Checking parity: Exclusive or (XOR): basic machine instruction A xor B true if either A or B true but not both Example: (even parity 0) 0110000 xor bit by bit 0 xor 1 = 1 xor 1 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 Example: MOV al,<char> TEST al, al JP <location if even> <go here if odd>

  19. Introduction: data types UTF-8 standard in the post-ASCII world backwards compatible with ASCII (previously, different languages had multi-byte character sets that clashed) Universal Character Set (UCS) Transformation Format 8-bits (Wikipedia)

  20. Introduction: data types Example: Hiragana letter A: UTF-8: E38182 Byte 1: E = 1110, 3 = 0011 Byte 2: 8 = 1000, 1 = 0001 Byte 3: 8 = 1000, 2 = 0010 Hiragana letter I: UTF-8: E38184 Shift-JIS (Hex): : 82A0 : 82A2

  21. Introduction: data types How can you tell what encoding your file is using? Detecting UTF-8 Microsoft: 1st three bytes in the file is EF BB BF (not all software understands this; not everybody uses it) HTML: <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" > (not always present) Analyze the file: Find non-valid UTF-8 sequences: if found, not UTF-8 Interesting paper: http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html

  22. Introduction: data types Text files: text files have lines: how do we mark the end of a line? End of line (EOL) control character(s): LF 0x0A (Mac/Linux), CR 0x0D (Old Macs), CR+LF 0x0D0A (Windows) End of file (EOF) control character: (EOT) 0x04 (aka Control-D) programming languages: NUL used to mark the end of a string binaryvision.nl

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#