Evolution of Integer Sizes in C Programming

 
15-122: Principles of
Imperative Computation
 
L
e
c
t
u
r
e
 
2
0
:
 
T
y
p
e
s
 
i
n
 
C
 
A
p
r
i
l
 
0
3
,
 
2
0
2
3
 
Today…
 
Last session
:
Midterm 2 exam
 
Today’s lecture
:
Types in C
Integer Types, Casting Integers, Fixed-size Numbers, Floating Point
Numbers, and Union and Enum Types
 
Announcements
:
Midterm 2 grades are out
Written assignment 10 is due today by 9PM
 
 
 
1
 
Balance Sheet … So far
 
 
2
 
Undefined Behavior
 
T
o
d
a
y
 
3
 
T
h
e
 
t
y
p
e
 
i
n
t
 
4
int
 Sizes
 
In C0/C1, the size of values of type 
int
 is 32 bits
o
And pointers are 64 bits
 
In C, the size of an 
int
 has evolved over time
o
And pointers too
Typical
Typical
5
int
 Sizes
 
In C, the size of an 
int
 has evolved over time
o
And pointers too
 
 
 
 
 
 
E
a
r
l
y
 
c
o
m
p
u
t
e
r
s
 
h
a
d
 
8
-
b
i
t
a
d
d
r
e
s
s
e
s
o
2
5
6
 
b
y
t
e
s
 
o
f
 
m
e
m
o
r
y
RAM was very expensive
int
s ranged from -128 to 127
The computer that
sent Apollo 11 to the moon
 
‘60s
HP 9830A
6
int
 Sizes
 
In C, the size of an 
int
 has evolved over time
o
And pointers too
 
 
 
 
 
 
16-bit addresses
o
(Up to) 64 kilobytes of memory
The Commodore 64
int
s ranged from -32768 to 32767
Apple II
Commodore 64
7
int
 Sizes
 
In C, the size of an 
int
 has evolved over time
o
And pointers too
 
 
 
 
 
 
32-bit addresses
o
(Up to) 4 gigabytes of memory
int
s ranged in the billions
PC
iMac
8
int
 Sizes
 
In C, the size of an 
int
 has evolved over time
o
And pointers too
 
 
 
 
 
 
64-bit addresses
o
Nobody has 2
64
 bytes memory
Billions are still Ok for 
int
s
9
Implementation-defined Behavior
 
The C standard says that it is for the compiler to define the
size of an 
int
With some constraints
 
I
t
 
i
s
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
-
d
e
f
i
n
e
d
;
 
t
h
e
 
c
o
m
p
i
l
e
r
 
d
e
c
i
d
e
s
,
 
b
u
t
:
o
It remains fixed
o
The programmer can find out how big an 
int
 is
The file 
<limits.h>
 defines the values of 
INT_MIN
 and 
INT_MAX
And therefore the size of an 
int
 
U
n
d
e
f
i
n
e
d
 
b
e
h
a
v
i
o
r
 
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
-
d
e
f
i
n
e
d
 
b
e
h
a
v
i
o
r
o
Undefined behavior does not have to be consistent
o
The programmer has no way to find out from inside the program
10
Implementation-defined Behavior
 
Most programmers don’t need to know how big an 
int
 is
o
Just write code normally, possibly using 
INT_MIN
 and 
INT_MAX
o
The compiler will use whatever internal size it has chosen
 
 
 
Same thing for pointers
 
Code written in the 1970s still works on today’s computers
o
As long as the code doesn’t depend on the size of an 
int
o
And the programmer used 
sizeof
 inside 
malloc
 
 
This is not true of code that uses
the bits of an 
int
 to encode data: 
bit patterns (e.g., pixels)
11
int
’s Undefined Behaviors
 
S
a
f
e
t
y
 
v
i
o
l
a
t
i
o
n
s
 
i
n
 
C
0
 
a
r
e
 
u
n
d
e
f
i
n
e
d
 
b
e
h
a
v
i
o
r
 
i
n
 
C
o
Division/modulus by 0, or 
INT_MIN
 divided/mod’ed by -1
o
Shifting by more than the size of an 
int
 
O
v
e
r
f
l
o
w
!
o
C programs do not necessarily use two’s complement
This makes it essentially
impossible to reason
about 
int
s in a C program
n + n - n 
and 
n 
may produce different results
o
g
c
c
 
p
r
o
v
i
d
e
s
 
t
h
e
 
f
l
a
g
 
-
f
w
r
a
p
v
 
t
o
 
f
o
r
c
e
 
t
h
e
 
u
s
e
 
o
f
 
t
w
o
s
c
o
m
p
l
e
m
e
n
t
 
f
o
r
 
i
n
t
s
 
And a few more
o
E.g., Left-shifting a negative value
12
In 1972, a lot of computers
didn’t use 2’s complement
 
O
t
h
e
r
 
I
n
t
e
g
e
r
 
T
y
p
e
s
 
13
Signed Integer Types
 
C0 has a single type of integers: 
int
 
C has many more
o
long
: Integers that are larger than 
int
64 bits nowadays
o
short
: Integers that are smaller than 
int
16 bits nowadays
o
char
: Integers that are smaller than 
short
8 bits nowadays
But always 1 byte
 
 
 
 
o
… And there are more
char
 is a number!
'a' 
is convenience syntax
The placeholder %c in 
printf
displays it as a character
C99 defines a byte as 
at least
 8 bit
14
Unsigned Integer Types
 
Lots of code doesn’t use negative numbers
 
C
 
p
r
o
v
i
d
e
s
 
u
n
s
i
g
n
e
d
 
v
a
r
i
a
n
t
s
 
o
f
 
e
a
c
h
 
i
n
t
e
g
e
r
 
t
y
p
e
Same number of bits but sign bit can be used to represent more numbers
Twice as many numbers
o
unsigned long
o
unsigned int
o
unsigned short
o
unsigned char
 
O
v
e
r
f
l
o
w
 
o
n
 
u
n
s
i
g
n
e
d
 
n
u
m
b
e
r
s
 
i
s
 
d
e
f
i
n
e
d
 
t
o
 
w
r
a
p
 
a
r
o
u
n
d
o
Unsigned numbers do follow the laws of modular arithmetic
15
The most significant bit
is not special for them
Unsigned Integer Types
 
size_t
 is used to hold pointer and offsets
o
The argument of 
malloc
 and 
calloc
o
Array indices
o
Return type of 
sizeof
o
Etc.,
 
The size of 
size_t
 is the size of a memory address
16
Implementation-defined Integers
And there are several more …
Whether 
char
 is signed or unsigned
is implementation-defined
17
 
C
a
s
t
i
n
g
 
I
n
t
e
g
e
r
s
 
18
Integer Casts
 
W
e
 
g
o
 
b
a
c
k
 
a
n
d
 
f
o
r
t
h
 
b
e
t
w
e
e
n
 
d
i
f
f
e
r
e
n
t
 
n
u
m
b
e
r
 
t
y
p
e
s
 
w
i
t
h
c
a
s
t
s
   
int
 
x
 = 3;
   
long
 
y
 = (
long
)x;
 
Literal numbers have always type 
int
   
3
o
T
h
e
 
c
o
m
p
i
l
e
r
 
i
n
t
r
o
d
u
c
e
s
 
i
m
p
l
i
c
i
t
 
c
a
s
t
s
 
a
s
 
n
e
e
d
e
d
   
 long 
x
 = 3;
Is implicitly turned into:
   
 long 
x
 = (
long
)3;
x is 0x00000003
y is 0x0000000000000003
This is an 
int
19
Integer Casts
 
This can lead to unexpected outcomes
   
 long 
x
 = 1 << 40;
 
is 
undefined behavior
o
This is implicitly turned into:
   
 long 
x
 = (
long
)(1 << 40);
 
 
 
 
Fix:
 
 
long 
x
 = ((
long
)1) << 40;
1 is an 
int
This shift 1 by 40 positions
But 1 has only 32 bits! 
20
Casting Rules
 
R
u
l
e
 
1
:
 
I
f
 
t
h
e
 
n
e
w
 
t
y
p
e
 
c
a
n
 
r
e
p
r
e
s
e
n
t
 
t
h
e
 
v
a
l
u
e
,
 
t
h
e
 
v
a
l
u
e
 
i
s
p
r
e
s
e
r
v
e
d
o
signed char 
x
 = 3;
 
// x is 3 (= 0x03)
 
unsigned char 
y
 = (
unsigned char
)x; 
 
// y is 3 (= 0x03)
 
o
signed char 
x
 = 3;
 
// x is 3 (= 0x03)
 
unsigned int 
y
 = (
unsigned int
)x; 
 
// y is 3 (= 0x00000003)
 
o
signed char 
x
 = -3;
 
// x is -3 (= 0xFD)
 
int 
y
 = (
int
)x; 
 
// y is -3 (= 0xFFFFFFFD)
 
o
unsigned char 
x
 = 253;
 
// x is 253 (= 0xFD)
 
unsigned int 
y
 = (
unsigned int
)x; 
 
// y is 253 (= 0x0000000FD)
 
o
int 
x
 = -3;
 
// x is -3 (= 0xFFFFFFFD)
 
signed char 
y
 = (
signed char
)x; 
 
// y is -3 (= 0xFD)
21
Casting Rules
 
R
u
l
e
 
2
:
 
I
f
 
t
h
e
 
n
e
w
 
t
y
p
e
 
c
a
n
t
 
r
e
p
r
e
s
e
n
t
 
t
h
e
 
v
a
l
u
e
,
 
b
u
t
 
i
s
u
n
s
i
g
n
e
d
:
o
C
a
s
e
 
1
:
 
T
h
e
 
n
e
w
 
t
y
p
e
 
i
s
 
s
m
a
l
l
e
r
 
o
r
 
t
h
e
 
s
a
m
e
,
 
t
h
e
 
l
e
a
s
t
 
s
i
g
n
i
f
i
c
a
n
t
b
i
t
s
 
a
r
e
 
r
e
t
a
i
n
e
d
int 
x
 = 
INT_MAX
;
 
// x is 2147483647
 
(= 0x7FFFFFFF)
 
unsigned char 
y
 = (
unsigned char
)x;
 
// y is 255
 
(= 0xFF)
 
signed char 
x
 = -3;
 
// x is -3    (= 0xFD)
 
unsigned char 
y
 = (
unsigned char
)x; 
 
// y is 253 (= 0xFD)
 
 
o
C
a
s
e
 
2
:
 
T
h
e
 
n
e
w
 
t
y
p
e
 
i
s
 
b
i
g
g
e
r
,
t
h
e
 
b
i
t
s
 
a
r
e
 
s
i
g
n
-
e
x
t
e
n
d
e
d
signed char 
x
 = -3;
 
// x is -3
 
(= 0xFD)
 
unsigned int 
y
 = (
unsigned int
)x; 
 
// y is 4294967293
 
(= 0xFFFFFFFD)
22
An unsigned type
can’t represent
negative numbers
An unsigned type
can’t represent
negative numbers
INT_MAX
 doesn’t fit into a 
char
Casting Rules
 
R
u
l
e
 
3
:
 
I
f
 
t
h
e
 
n
e
w
 
t
y
p
e
 
c
a
n
t
 
r
e
p
r
e
s
e
n
t
 
t
h
e
 
v
a
l
u
e
,
 
b
u
t
 
i
s
s
i
g
n
e
d
,
 
t
h
e
 
r
e
s
u
l
t
 
i
s
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
-
d
e
f
i
n
e
d
 
 
o
int 
x
 = 
INT_MAX
;
 
 
    
// x is 2147483647 (= 0x7FFFFFFF)
 
signed char 
y
 = (
signed char
)x; 
 
// y is 
??
 
 
o
int 
x
 = 
-241
;
 
 
     
// x is -241(= 0xFFFFFF0F)
 
signed char 
y
 = (
signed char
)x; 
 
// y is 
??
23
Many compilers discard
the most significant bits
 … often -1= (0xFF)
 … often 15= (0x0F)
Casting Summary
24
n
e
w
_
t
y
p
e
 
c
a
n
r
e
p
r
e
s
e
n
t
 
t
h
e
 
v
a
l
u
e
o
f
 
e
x
p
?
n
e
w
_
t
y
p
e
i
s
 
s
i
g
n
e
d
?
T
h
e
 
v
a
l
u
e
 
o
f
 
e
x
p
i
s
 
p
r
e
s
e
r
v
e
d
 
Yes
 
No
n
e
w
_
t
y
p
e
i
s
 
l
a
r
g
e
r
 
t
h
a
n
o
l
d
_
t
y
p
e
?
 
No
Implementation-defined
(often discard the most significant bits)
 
Yes
T
h
e
 
b
i
t
s
 
a
r
e
s
i
g
n
-
e
x
t
e
n
d
e
d
 
Yes
T
h
e
 
l
e
a
s
t
-
s
i
g
n
i
f
i
c
a
n
t
 
b
i
t
s
a
r
e
 
r
e
t
a
i
n
e
d
 
No
 
(
new_type
) 
exp
 
exp
 of type 
old_type
 
R
u
l
e
 
1
 
R
u
l
e
 
2
 
(
C
a
s
e
 
1
)
 
R
u
l
e
 
2
 
(
C
a
s
e
 
2
)
 
R
u
l
e
 
3
 
F
i
x
e
d
-
s
i
z
e
 
N
u
m
b
e
r
s
 
25
 
Fixed-size Integers
 
For bit patterns, the program needs the number of bits to
remain the same as C evolves
 
H
e
a
d
e
r
 
f
i
l
e
 
<
s
t
d
i
n
t
.
h
>
 
p
r
o
v
i
d
e
s
 
f
i
x
e
d
-
s
i
z
e
 
i
n
t
e
g
e
r
 
t
y
p
e
s
o
In signed and unsigned variants
That’s the number of bits
 
26
 
F
l
o
a
t
i
n
g
 
P
o
i
n
t
 
N
u
m
b
e
r
s
 
27
 
float
 
T
h
e
 
t
y
p
e
 
f
l
o
a
t
 
r
e
p
r
e
s
e
n
t
s
 
f
l
o
a
t
i
n
g
 
p
o
i
n
t
 
n
u
m
b
e
r
s
Nowadays 32 bits
float
 
x
 = 0.1;
float
 
y
 = 2.0235E-27;
 
float
 and 
int
 use the same number of bits,
but 
float
 has a much larger range
o
Some numbers with a decimal point are not representable
o
T
h
e
 
l
a
r
g
e
r
 
r
a
n
g
e
 
c
o
m
e
s
 
a
t
 
t
h
e
 
c
o
s
t
 
o
f
 
p
r
e
c
i
s
i
o
n
O
p
e
r
a
t
i
o
n
s
 
o
n
 
f
l
o
a
t
s
 
m
a
y
 
c
a
u
s
e
 
r
o
u
n
d
i
n
g
 
e
r
r
o
r
s
Numbers with a decimal point
That’s 2.0235 * 10
-27
 
28
 
float
 
O
p
e
r
a
t
i
o
n
s
 
o
n
 
f
l
o
a
t
s
 
m
a
y
 
c
a
u
s
e
 
r
o
u
n
d
i
n
g
 
e
r
r
o
r
s
 
o
Example 1
#
i
n
c
l
u
d
e
 
<
m
a
t
h
.
h
>
#
d
e
f
i
n
e
 
P
I
 
3
.
1
4
1
5
9
2
6
5
  
float
 
x
 = sin(PI);
 
o
Example 2
  
float
 
y
 = (10E20 / 10E10) * 10E10;
We expect y to be equal to 10E20
But it isn’t always
It depends on the compiler
Defines 
sin
, 
cos
, 
log
, …
Any more decimals
would be ignored
In math, 
sin(
)
 is 
0
but sin(PI) is not 0.0
That’s (10
20
/10
10
) * 10
10
 
29
 
float
 
O
p
e
r
a
t
i
o
n
s
 
o
n
 
f
l
o
a
t
s
 
m
a
y
 
c
a
u
s
e
 
r
o
u
n
d
i
n
g
 
e
r
r
o
r
s
 
o
Example 3
  
for
 (
float
 
res
 = 0.0; res != 5.0; res += 0.1)
   
printf(
"res = %f\n"
, res);
We expect the loop to terminate after 50 iterations
Instead it runs for ever
T
h
a
t
s
 
b
e
c
a
u
s
e
 
0
.
1
 
d
e
c
i
m
a
l
 
i
s
 
a
 
p
e
r
i
o
d
i
c
 
n
u
m
b
e
r
 
i
n
 
b
i
n
a
r
y
:
 
0
.
0
0
0
1
1
 
This is how we
convert 0.1 to binary
At this point, it repeats
 
30
 
float
 
O
p
e
r
a
t
i
o
n
s
 
o
n
 
f
l
o
a
t
s
 
m
a
y
 
c
a
u
s
e
 
r
o
u
n
d
i
n
g
 
e
r
r
o
r
s
 
This makes it impossible to reason about programs
o
This is why there are no 
float
s in C0
 
 
Adding more bits does not solve the problem
o
T
h
e
 
t
y
p
e
 
d
o
u
b
l
e
 
o
f
 
d
o
u
b
l
e
-
p
r
e
c
i
s
i
o
n
 
f
l
o
a
t
i
n
g
 
p
o
i
n
t
 
n
u
m
b
e
r
s
 
h
a
s
t
y
p
i
c
a
l
l
y
 
6
4
 
b
i
t
s
 
n
o
w
a
d
a
y
s
Similar issues
 
31
 
U
n
i
o
n
 
a
n
d
 
E
n
u
m
 
T
y
p
e
s
 
32
 
Sample Problem
 
Print a message based on the season
 
How to encode seasons?
o
Use strings …
Testing which season we are in is costly
o
Use integers
 
Drawbacks
o
The encoding is not mnemonic
We will make mistakes
o
A whole 
int
 for 4 values seems wasteful
// 0 = Winter
// 1 = Spring
// 2 = Summer
// 3 = Fall
 
int
 today = 3;
if
 (today == 0)
  printf(
"snow!\n"
);
else if 
(today == 3)
  printf(
"leaves!\n"
);
else
  printf(
"sun!\n"
);
 
33
 
Enum Types
 
o
The encoding is not mnemonic
o
A whole 
int
 for 4 values seems wasteful
 
A
n
 
e
n
u
m
 
t
y
p
e
 
l
e
t
s
o
The programmer choose mnemonic values
no need to remember the encoding – just use the names
o
The compiler decide how to implement them
What actual type to map them to
What values to use
 
 
 
 
The compiler optimizes
space usage
enum season
 { WINTER, SPRING, SUMMER, FALL };
 
enum season
 today = FALL;
if
 (today == WINTER)
  printf(
"snow!\n"
);
else if 
(today == FALL)
  printf(
"leaves!\n"
);
else
  printf(
"sun!\n"
);
By convention, enum
values are written in
all caps
The compiler maps enum
names to some numerical values
 
34
 
Switch Statements
 
A
 
s
w
i
t
c
h
 
s
t
a
t
e
m
e
n
t
 
i
s
 
a
n
 
a
l
t
e
r
n
a
t
i
v
e
 
t
o
 
c
a
s
c
a
d
e
d
 
i
f
-
e
l
s
e
s
f
o
r
 
n
u
m
e
r
i
c
a
l
 
v
a
l
u
e
s
Including union types
o
They make the code
more readable
 
E
a
c
h
 
v
a
l
u
e
 
c
o
n
s
i
d
e
r
e
d
 
i
s
h
a
n
d
l
e
d
 
b
y
 
a
 
c
a
s
e
o
T
h
e
 
e
x
e
c
u
t
i
o
n
 
o
f
 
a
 
c
a
s
e
c
o
n
t
i
n
u
e
s
 
t
i
l
l
 
t
h
e
 
n
e
x
t
 
b
r
e
a
k
o
r
 
t
h
e
 
e
n
d
 
o
f
 
t
h
e
 
s
w
i
t
c
h
s
t
a
t
e
m
e
n
t
It exits the switch statement
o
The 
default
 case handles any remaining value
enum season
 { WINTER, SPRING, SUMMER, FALL };
 
enum season
 today = FALL;
 
switch 
(today) {
  
case
 WINTER:
     printf(
"snow!\n"
);
     
break
;
 
  case 
FALL:
     printf(
"leaves!\n"
);
     
break
;
 
  default
:
     printf(
"sun!\n"
);
}
 
a case
 
another case
 
the default case
 
35
 
Switch Statements
 
 
 
 
 
 
If a 
break
 is missing,
the execution continues
with the next 
case
T
h
i
s
 
t
h
e
 
s
o
u
r
c
e
 
o
f
 
m
a
n
y
 
b
u
g
s
!
Recent versions of gcc
issue a warning
when this happens
enum season
 { WINTER, SPRING, SUMMER, FALL };
 
enum season
 today = FALL;
 
switch 
(today) {
  
case
 WINTER:
     printf(
"snow!\n"
);
     
break
;
 
  case 
FALL:
     printf(
"leaves!\n"
);
     
break
;
 
  default
:
     printf(
"sun!\n"
);
}
 
a case
 
another case
 
the default case
 
36
 
Another Sample Problem
 
Define a type for binary trees with 
int
 data only in their
leaves
A
n
d
 
w
h
e
r
e
 
t
h
e
 
e
m
p
t
y
 
t
r
e
e
 
i
s
 
n
o
t
 
r
e
p
r
e
s
e
n
t
e
d
 
a
s
 
N
U
L
L
 
o
A
 
l
e
a
f
y
 
t
r
e
e
 
c
o
u
l
d
 
b
e
An inner node with pointers to two children
A leaf with 
int
 data
An empty tree
 
o
Then:
enum
 
nodekind
 = { INNER, LEAF, EMPTY };
 
struct ltree 
{
  enum nodekind 
kind;
  
int
 data;
  
leafytree *
left;
  
leafytree *
right;
};
typedef
 
struct ltree leafytree
;
We now know about
enum types!
We now know about
enum types!
42
 
e
m
p
t
y
A leaf
An inner node
The empty tree
 
37
 
Sample Problem
 
This representation wastes memory
 
o
The compiler will pick a small
numerical type for kind
Probably a 
char
 
 
but
o
The remaining 3 fields are never fully utilized for any node type
Inner nodes do not make use of the data field
Leaves do not use left and right
The empty tree does not need any
enum
 
nodekind
 = { INNER, LEAF, EMPTY };
 
struct ltree 
{
  enum nodekind 
kind;
  
int
 data;
  
leafytree *
left;
  
leafytree *
right;
};
typedef
 
struct ltree leafytree
;
 
 
 
38
 
Union Types
 
A
 
u
n
i
o
n
 
t
y
p
e
 
a
l
l
o
w
s
 
u
s
i
n
g
 
t
h
e
 
s
a
m
e
 
s
p
a
c
e
 
i
n
 
d
i
f
f
e
r
e
n
t
 
w
a
y
s
 
Consider the space needed for a node, aside from its type
 
s
p
a
c
e
An inner node
uses the space
to store two pointers
A leaf uses
part of the space
to store an 
int
The empty tree
does not use
any space
 
39
 
Union Types
 
A
 
u
n
i
o
n
 
t
y
p
e
 
a
l
l
o
w
s
 
u
s
i
n
g
 
t
h
e
 
s
a
m
e
 
s
p
a
c
e
 
i
n
 
d
i
f
f
e
r
e
n
t
 
w
a
y
s
 
s
p
a
c
e
enum
 
nodekind
 { INNER, LEAF, EMPTY };
 
struct innernode 
{
    
leafytree *
left;
    
leafytree *
right;
  };
 
union nodecontent 
{
    int
 data;
    
struct innernode 
node;
  };
 
struct ltree 
{
    
enum nodekind 
kind;
    
union nodecontent 
content;
};
typedef
 
struct ltree leafytree
;
An inner node
consists of two pointers
The content of a generic node is
e
i
t
h
e
r
 
a
n
 
i
n
t
 
(
t
h
e
 
d
a
t
a
 
o
f
 
a
 
l
e
a
f
)
o
r
 
a
n
 
i
n
n
e
r
 
n
o
d
e
There is no need to
have an option for
the empty tree since
it uses no space
C11 supports a much more compact syntax
 
40
 
Building a Tree
 
Let’s write code that creates this tree
enum
 
nodekind
 { INNER, LEAF, EMPTY };
 
struct innernode 
{
    
leafytree *
left;
    
leafytree *
right;
  };
 
union nodecontent 
{
    int
 data;
    
struct innernode 
node;
  };
 
struct ltree 
{
    
enum nodekind 
kind;
    
union nodecontent 
content;
};
typedef
 
struct ltree leafytree
;
leafytree *
T = malloc(sizeof(
leafytree
));
T->kind = INNER;
T->content.node.left = malloc(sizeof(
leafytree
));
T->content.node.left->kind = EMPTY;
T->content.node.right = malloc(sizeof(
leafytree
));
T->content.node.right->kind = LEAF;
T->content.node.right->content.data = 42;
42
 
e
m
p
t
y
A leaf
An inner node
The empty tree
Whenever not following a pointer,
we must use the dot notation
 
 
 
 
 
41
 
Adding up a Leafy Tree
 
We use a 
switch
 statement to write clear code
o
We discriminate on T->kind
o
It has three possible values
INNER, LEAF and EMPTY
int
 
add_tree
(
leafytree *
T
) {
  
int
 
n
 = 0;
 
  
switch
 (T->kind) {
    
case
 INNER:
       n += add_tree(T->content.node.left);
       n += add_tree(T->content.node.right);
       
break
;
 
    
case
 LEAF:
       n = T->content.data;
       
break
;
 
    
default
:
       n = 0;
   }
 
   
return
 n;
}
 
42
 
S
u
m
m
a
r
y
 
43
 
Undefined Behavior
 
 
44
 
A
l
t
e
r
n
a
t
i
v
e
 
p
r
e
s
e
n
t
a
t
i
o
n
 
o
f
t
h
e
 
c
a
s
t
i
n
g
 
r
u
l
e
s
 
 
45
 
Casting Rules
 
W
h
e
n
 
c
a
s
t
i
n
g
 
b
e
t
w
e
e
n
 
s
i
g
n
e
d
 
a
n
d
 
u
n
s
i
g
n
e
d
 
i
n
t
e
g
e
r
s
 
o
f
t
h
e
 
s
a
m
e
 
s
i
z
e
,
 
t
h
e
 
b
i
t
 
p
a
t
t
e
r
n
 
i
s
 
p
r
e
s
e
r
v
e
d
 
 
o
Example 1
  
signed char 
x
 = 3;
       
// x is 3 (= 0x03)
  
unsigned char 
y
 = (
unsigned char
)x; 
  
// y is 3 (= 0x03)
 
o
Example 2
  
signed char 
x
 = -3;
      
// x is -3    (= 0xFD)
  
unsigned char 
y
 = (
unsigned char
)x; 
  
// y is 253 (= 0xFD)
This is actually implementation-defined
(but commonplace)
 
46
 
Casting Rules
 
W
h
e
n
 
c
a
s
t
i
n
g
 
s
m
a
l
l
 
t
o
 
b
i
g
 
i
n
t
e
g
e
r
s
 
o
f
 
t
h
e
 
s
a
m
e
s
i
g
n
e
d
n
e
s
s
,
 
t
h
e
 
v
a
l
u
e
 
i
s
 
p
r
e
s
e
r
v
e
d
 
o
signed char 
x
 = 3;
 
// x is 3 (= 0x03)
 
int 
y
 = (
int
)x; 
 
// y is 3 (= 0x00000003)
 
o
signed char 
x
 = -3;
 
// x is -3 (= 0xFD)
 
int 
y
 = (
int
)x; 
 
// y is -3 (= 0xFFFFFFFD)
 
o
unsigned char 
x
 = 253;
 
// x is 253 (= 0xFD)
 
unsigned int 
y
 = (
unsigned int
)x; 
 
// y is 253 (= 0x0000000FD)
 
47
I
t
 
d
o
e
s
 
s
i
g
n
 
e
x
t
e
n
s
i
o
n
o
n
 
s
i
g
n
e
d
 
t
y
p
e
s
I
t
 
d
o
e
s
 
s
i
g
n
 
e
x
t
e
n
s
i
o
n
o
n
 
s
i
g
n
e
d
 
t
y
p
e
s
 
Casting Rules
 
W
h
e
n
 
c
a
s
t
i
n
g
 
b
i
g
 
t
o
 
s
m
a
l
l
 
u
n
s
i
g
n
e
d
 
i
n
t
e
g
e
r
s
,
 
t
h
e
 
m
o
s
t
s
i
g
n
i
f
i
c
a
n
t
 
b
i
t
s
 
a
r
e
 
d
i
s
c
a
r
d
e
d
o
casting unsigned integers leverages modular arithmetic
 
o
Example 1
 
unsigned int 
x
 = 3;
 
// x is 3
 
(= 0x00000003)
unsigned char 
y
 = (
unsigned char
)x;
 
// y is 3
 
(= 0x03)
 
o
Example 2
 
unsigned int 
x
 = 
UINT_MAX
;
 
// x is 4294967295
  
(= 0xFFFFFFFF)
unsigned char 
y
 = (
unsigned char
)x; 
 
// y is 255
 
(=0xFF)
 
48
 
Casting Rules
 
W
h
e
n
 
c
a
s
t
i
n
g
 
b
i
g
 
t
o
 
s
m
a
l
l
 
s
i
g
n
e
d
 
i
n
t
e
g
e
r
s
,
 
t
h
e
 
v
a
l
u
e
 
i
s
p
r
e
s
e
r
v
e
d
 
 
i
f
 
i
t
 
i
s
 
r
e
p
r
e
s
e
n
t
a
b
l
e
o
implementation-defined otherwise
 
o
int 
x
 = 3;
 
// x is 3 (= 0x00000003)
signed char 
y
 = (
signed char
)x; 
 
// y is 3 (= 0x03)
 
o
int 
x
 = -3;
 
// x is -3 (= 0xFFFFFFFD)
 
signed char 
y
 = (
signed char
)x; 
 
// y is -3 (= 0xFD)
 
o
int 
x
 = 
INT_MAX
;
 
// x is 2147483647 (= 0x7FFFFFFF)
 
signed char 
y
 = (
signed char
)x; 
 
// y is 
??
 
49
Many compilers discard
the most significant bits
 … often -1= (0xFF)
 
Casting across Signedness and Size
 
How may the compiler apply the rules?
  
unsigned char 
x
 = 0xFD; 
  
 
 
// x is 253
  
int 
y
 = (
int
)x; 
      
// y is …
 
 
 
 
 
 
 
 
 
is y 253 or -3?
 
0xFD
 
0xFD
 
0x000000FD
 
0xFFFFFFFD
 
0x000000FD
 
c
a
s
t
 
t
o
 
s
i
g
n
e
d
 
c
h
a
r
p
r
e
s
e
r
v
e
s
 
b
i
t
 
p
a
t
t
e
r
n
 
c
a
s
t
 
t
o
 
(
s
i
g
n
e
d
)
 
i
n
t
p
r
e
s
e
r
v
e
s
 
v
a
l
u
e
 
c
a
s
t
 
t
o
 
u
n
s
i
g
n
e
d
 
i
n
t
p
r
e
s
e
r
v
e
s
 
v
a
l
u
e
 
c
a
s
t
 
t
o
 
(
s
i
g
n
e
d
)
 
i
n
t
p
r
e
s
e
r
v
e
s
 
b
i
t
 
p
a
t
t
e
r
n
253
-3
-3
253
253
 
50
 
Casting across Signedness and Size
 
How may the compiler apply the rules?
  
unsigned char 
x
 = 0xFD; 
  
 
 
// x is 253
  
int 
y
 = (
int
)x; 
      
// y is …
o
Is y -3 or 253?
the order of casts is actually defined
but who remembers it?
 
S
o
l
u
t
i
o
n
:
 
b
e
 
e
x
p
l
i
c
i
t
o
Write either
  
int 
y
 = (
int
)(
unsigned int
)x; 
  
// y is 253
 
to change first the size and then the signedness
o
or
  
int 
y
 = (
int
)(
signed char
)x; 
  
// y is -3
 
to change first the signedness and then the size
 
51
Slide Note
Embed
Share

The evolution of integer sizes in C programming is explored, from early computers with 8-bit addresses to modern systems with 64-bit pointers. The variations in integer sizes, pointer sizes, and memory capacities over decades are highlighted, showcasing the advancements in computing technology.

  • Integer Sizes
  • C Programming
  • Evolution
  • Computing Technology
  • Memory Capacities

Uploaded on Sep 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. 15-122: Principles of Imperative Computation Lecture 20: Types in C April 03, 2023

  2. Today Last session: Midterm 2 exam Today s lecture: Types in C Integer Types, Casting Integers, Fixed-size Numbers, Floating Point Numbers, and Union and Enum Types Announcements: Midterm 2 grades are out Written assignment 10 is due today by 9PM 1

  3. Balance Sheet So far Lost Gained Contracts Safety Garbage collection Memory initialization Well-behaved arrays Fully-defined language Strings Preprocessor Undefined behavior Explicit memory management Separate compilation Pointer arithmetic Stack-allocated arrays and structs Generalized address-of 2

  4. Undefined Behavior Reading/writing to non-allocated memory Reading uninitialized memory even if correctly allocated Use after free Double free Freeing memory not returned by malloc/calloc Writing to read-only memory Memory Numbers Today 3

  5. The type int 4

  6. int Sizes In C0/C1, the size of values of type int is 32 bits o And pointers are 64 bits In C, the size of an int has evolved over time o And pointers too Pointer size 8 16 32 64 int size 8 16 32 32 Typical Typical Today 70s 80s 90s 5

  7. int Sizes In C, the size of an int has evolved over time o And pointers too Pointer size 16 32 64 8 int size 16 32 32 8 Today 70s 80s 90s HP 9830A Early computers had 8-bit addresses o 256 bytes of memory RAM was very expensive ints ranged from -128 to 127 60s The computer that sent Apollo 11 to the moon 6

  8. int Sizes In C, the size of an int has evolved over time o And pointers too Pointer size 8 32 64 16 int size 8 32 32 16 Today 70s 80s 90s Commodore 64 16-bit addresses o (Up to) 64 kilobytes of memory The Commodore 64 ints ranged from -32768 to 32767 Apple II 7

  9. int Sizes In C, the size of an int has evolved over time o And pointers too Pointer size 8 16 64 32 int size 8 16 32 32 Today 70s 80s 90s iMac 32-bit addresses o (Up to) 4 gigabytes of memory ints ranged in the billions PC 8

  10. int Sizes In C, the size of an int has evolved over time o And pointers too Pointer size 8 16 32 64 int size 8 16 32 32 Today 70s 80s 90s 64-bit addresses o Nobody has 264 bytes memory Billions are still Ok for ints 9

  11. Implementation-defined Behavior The C standard says that it is for the compiler to define the size of an int With some constraints It is implementation-defined; the compiler decides, but: o It remains fixed o The programmer can find out how big an int is The file <limits.h> defines the values of INT_MIN and INT_MAX And therefore the size of an int Undefined behavior implementation-defined behavior o Undefined behavior does not have to be consistent o The programmer has no way to find out from inside the program 10

  12. Implementation-defined Behavior Most programmers don t need to know how big an int is o Just write code normally, possibly using INT_MIN and INT_MAX o The compiler will use whatever internal size it has chosen This is not true of code that uses the bits of an int to encode data: bit patterns (e.g., pixels) Same thing for pointers Code written in the 1970s still works on today s computers o As long as the code doesn t depend on the size of an int o And the programmer used sizeof inside malloc 11

  13. ints Undefined Behaviors Safety violations in C0 are undefined behavior in C o Division/modulus by 0, or INT_MIN divided/mod ed by -1 o Shifting by more than the size of an int Overflow! o C programs do not necessarily use two s complement This makes it essentially impossible to reason about ints in a C program n + n - n and n may produce different results o gcc provides the flag -fwrapvto force the use of two s complement for ints In 1972, a lot of computers didn t use 2 s complement And a few more o E.g., Left-shifting a negative value 12

  14. Other Integer Types 13

  15. Signed Integer Types C0 has a single type of integers: int C has many more o long: Integers that are larger than int 64 bits nowadays o short: Integers that are smaller than int 16 bits nowadays o char: Integers that are smaller than short 8 bits nowadays But always 1 byte char is a number! 'a' is convenience syntax The placeholder %c in printf displays it as a character C99 defines a byte as at least 8 bit o And there are more 14

  16. Unsigned Integer Types Lots of code doesn t use negative numbers C provides unsigned variants of each integer type Same number of bits but sign bit can be used to represent more numbers Twice as many numbers o unsigned long o unsigned int o unsigned short o unsigned char The most significant bit is not special for them Overflow on unsigned numbers is defined to wrap around o Unsigned numbers do follow the laws of modular arithmetic 15

  17. Unsigned Integer Types size_t is used to hold pointer and offsets o The argument of malloc and calloc o Array indices o Return type of sizeof o Etc., The size of size_t is the size of a memory address 16

  18. Implementation-defined Integers Whether char is signed or unsigned is implementation-defined signed unsigned C99 constraints Today s size signed char unsigned char exactly 1 byte 8 bits short unsigned short range at least (-215, 215) 16 bits int unsigned int range at least (-215, 215) 32 bits long unsigned long range at least (-231, 231) 64 bits size_t 64 bits And there are several more 17

  19. Casting Integers 18

  20. Integer Casts We go back and forth between different number types with casts int x = 3; long y = (long)x; x is 0x00000003 y is 0x0000000000000003 Literal numbers have always type int 3 o The compiler introduces implicit casts as needed long x = 3; Is implicitly turned into: long x = (long)3; This is an int 19

  21. Integer Casts This can lead to unexpected outcomes long x = 1 << 40; is undefined behavior o This is implicitly turned into: long x = (long)(1 << 40); 1 is an int This shift 1 by 40 positions But 1 has only 32 bits! Fix: long x = ((long)1) << 40; 20

  22. Casting Rules Rule 1: If the new type can represent the value, the value is preserved o signed char x = 3; unsigned char y = (unsigned char)x; // y is 3 (= 0x03) // x is 3 (= 0x03) o signed char x = 3; unsigned int y = (unsigned int)x; // x is 3 (= 0x03) // y is 3 (= 0x00000003) o signed char x = -3; int y = (int)x; // x is -3 (= 0xFD) // y is -3 (= 0xFFFFFFFD) o unsigned char x = 253; unsigned int y = (unsigned int)x; // x is 253 (= 0xFD) // y is 253 (= 0x0000000FD) o int x = -3; signed char y = (signed char)x; // x is -3 (= 0xFFFFFFFD) // y is -3 (= 0xFD) 21

  23. Casting Rules Rule 2: If the new type can t represent the value, but is unsigned: o Case 1: The new type is smaller or the same, the least significant bits are retained int x = INT_MAX; unsigned char y = (unsigned char)x; // x is 2147483647 (= 0x7FFFFFFF) (= 0xFF) // y is 255 INT_MAX doesn t fit into a char signed char x = -3; unsigned char y = (unsigned char)x; // x is -3 (= 0xFD) // y is 253 (= 0xFD) An unsigned type can t represent negative numbers negative numbers An unsigned type can t represent o Case 2: The new typeis bigger, the bits are sign-extended signed char x = -3; unsigned int y = (unsigned int)x; // x is -3 // y is 4294967293 (= 0xFD) (= 0xFFFFFFFD) 22

  24. Casting Rules Rule 3: If the new type can t represent the value, but is signed, the result is implementation-defined Many compilers discard the most significant bits o int x = INT_MAX; signed char y = (signed char)x; // y is ?? // x is 2147483647 (= 0x7FFFFFFF) often -1= (0xFF) o int x = -241; signed char y = (signed char)x; // y is ?? // x is -241(= 0xFFFFFF0F) often 15= (0x0F) 23

  25. exp of type old_type Casting Summary (new_type) exp Rule 1 new_type can represent the value of exp? The value of exp is preserved Yes No Rule 3 new_type is signed? Implementation-defined (often discard the most significant bits) Yes No Rule 2 (Case 2) new_type is larger than old_type? The bits are sign-extended Yes Rule 2 (Case 1) No The least-significantbits are retained 24

  26. Fixed-size Numbers 25

  27. Fixed-size Integers For bit patterns, the program needs the number of bits to remain the same as C evolves Header file <stdint.h> provides fixed-size integer types o In signed and unsigned variants Fixed-size signed int8_t Fixed-size unsigned uint8_t Today s signed equivalent signed char Today s unsigned equivalent unsigned char int16_t short unsigned short uint16_t int32_t int unsigned int uint32_t int64_t long unsigned long uint64_t That s the number of bits 26

  28. Floating Point Numbers 27

  29. float The type float represents floating point numbers Nowadays 32 bits float x = 0.1; float y = 2.0235E-27; That s 2.0235 * 10-27 Numbers with a decimal point float and int use the same number of bits, but float has a much larger range o Some numbers with a decimal point are not representable o The larger range comes at the cost of precision Operations on floats may cause rounding errors 28

  30. float Danger Operations on floats may cause rounding errors Defines sin, cos, log, o Example 1 #include <math.h> #define PI 3.14159265 float x = sin(PI); Any more decimals would be ignored In math, sin( ) is 0 but sin(PI) is not 0.0 o Example 2 float y = (10E20 / 10E10) * 10E10; We expect y to be equal to 10E20 But it isn t always It depends on the compiler That s (1020/1010) * 1010 29

  31. float Danger Operations on floats may cause rounding errors o Example 3 for (float res = 0.0; res != 5.0; res += 0.1) printf("res = %f\n", res); We expect the loop to terminate after 50 iterations Instead it runs for ever That s because 0.1 decimal is a periodic number in binary: 0.00011 This is how we convert 0.1 to binary 0.1 * 2 = 0.2 * 2 = 0.4 * 2 = 0.8 * 2 = 0.6 * 2 = 0.2 0.2 0.4 0.8 1.6 1.2 At this point, it repeats 30

  32. float Operations on floats may cause rounding errors This makes it impossible to reason about programs o This is why there are no floats in C0 Adding more bits does not solve the problem o The type double of double-precision floating point numbers has typically 64 bits nowadays Similar issues 31

  33. Union and Enum Types 32

  34. Sample Problem Print a message based on the season // 0 = Winter // 1 = Spring // 2 = Summer // 3 = Fall How to encode seasons? o Use strings Testing which season we are in is costly o Use integers int today = 3; if (today == 0) printf("snow!\n"); else if (today == 3) printf("leaves!\n"); else printf("sun!\n"); Drawbacks o The encoding is not mnemonic We will make mistakes o A whole int for 4 values seems wasteful 33

  35. Enum Types o The encoding is not mnemonic o A whole int for 4 values seems wasteful An enum type lets o The programmer choose mnemonic values no need to remember the encoding just use the names o The compiler decide how to implement them What actual type to map them to What values to use By convention, enum values are written in all caps enum season { WINTER, SPRING, SUMMER, FALL }; enum season today = FALL; if (today == WINTER) printf("snow!\n"); else if (today == FALL) printf("leaves!\n"); else printf("sun!\n"); The compiler maps enum names to some numerical values The compiler optimizes space usage 34

  36. Switch Statements A switch statement is an alternative to cascaded if-elses for numerical values Including union types o They make the code more readable enum season { WINTER, SPRING, SUMMER, FALL }; enum season today = FALL; switch (today) { case WINTER: printf("snow!\n"); break; a case Each value considered is handled by a case o The execution of a case continues till the next break or the end of the switch statement It exits the switch statement o The default case handles any remaining value case FALL: printf("leaves!\n"); break; another case default: printf("sun!\n"); } the default case 35

  37. Switch Statements enum season { WINTER, SPRING, SUMMER, FALL }; Danger enum season today = FALL; switch (today) { case WINTER: printf("snow!\n"); break; a case If a break is missing, the execution continues with the next case case FALL: printf("leaves!\n"); break; another case default: printf("sun!\n"); } the default case This the source of many bugs! Recent versions of gcc issue a warning when this happens 36

  38. Another Sample Problem Define a type for binary trees with int data only in their leaves And where the empty tree is not represented as NULL o A leafy tree could be An inner node with pointers to two children A leaf with int data An empty tree An inner node The empty tree A leaf empty 42 o Then: enum nodekind = { INNER, LEAF, EMPTY }; struct ltree { enum nodekind kind; int data; leafytree *left; leafytree *right; }; typedef struct ltree leafytree; We now know about enum types! enum types! We now know about 37

  39. Sample Problem This representation wastes memory enum nodekind = { INNER, LEAF, EMPTY }; o The compiler will pick a small numerical type for kind Probably a char struct ltree { enum nodekind kind; int data; leafytree *left; leafytree *right; }; typedef struct ltree leafytree; but o The remaining 3 fields are never fully utilized for any node type Inner nodes do not make use of the data field Leaves do not use left and right The empty tree does not need any 38

  40. Union Types A union type allows using the same space in different ways Consider the space needed for a node, aside from its type data left space right An inner node uses the space to store two pointers A leaf uses part of the space to store an int The empty tree does not use any space 39

  41. data left space Union Types right A union type allows using the same space in different ways enum nodekind { INNER, LEAF, EMPTY }; struct innernode { leafytree *left; leafytree *right; }; An inner node consists of two pointers union nodecontent { int data; struct innernode node; }; The content of a generic node is either an int (the data of a leaf) or an inner node struct ltree { enum nodekind kind; union nodecontent content; }; typedef struct ltree leafytree; There is no need to have an option for the empty tree since it uses no space C11 supports a much more compact syntax 40

  42. enum nodekind { INNER, LEAF, EMPTY }; Building a Tree struct innernode { leafytree *left; leafytree *right; }; union nodecontent { int data; struct innernode node; }; Let s write code that creates this tree An inner node struct ltree { enum nodekind kind; union nodecontent content; }; typedef struct ltree leafytree; The empty tree A leaf empty 42 INNER leafytree *T = malloc(sizeof(leafytree)); T->kind = INNER; T->content.node.left = malloc(sizeof(leafytree)); T->content.node.left->kind = EMPTY; T->content.node.right = malloc(sizeof(leafytree)); T->content.node.right->kind = LEAF; T->content.node.right->content.data = 42; LEAF EMPTY 42 Whenever not following a pointer, we must use the dot notation 41

  43. Adding up a Leafy Tree We use a switch statement to write clear code o We discriminate on T->kind o It has three possible values INNER, LEAF and EMPTY int add_tree(leafytree *T) { int n = 0; switch (T->kind) { case INNER: n += add_tree(T->content.node.left); n += add_tree(T->content.node.right); break; case LEAF: n = T->content.data; break; default: n = 0; } return n; } 42

  44. Summary 43

  45. Undefined Behavior Reading/writing to non-allocated memory Reading uninitialized memory even if correctly allocated Use after free Double free Freeing memory not returned by malloc/calloc Writing to read-only memory Division/mod by zero INT_MIN divided/mod ed by -1 Shift by more than the number of bits Signed overflow Memory Numbers 44

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#