Introduction to File Processing in Python Programming

Python Programming, 4/e
Python Programming:
An Introduction To
Computer Science
Chapter 10
Persistent Data
Python Programming, 4/e
To understand basic file-processing concepts and
techniques for opening, reading, and writing files in
To understand the structure of text files and be able
to write programs that use them.
To become familiar with the basic organization of file
systems, including role of absolute and relative paths
play in locating files, and be able to write Python
programs that process collections of files.
Python Programming, 4/e
To understand binary data and the bytes data type
and be able to create programs that store and load
Python objects from files using the 
To recognize the similarity between working with local
files and working with network resources.
Python Programming, 4/e
Text Files
In all of the examples so far, data has either been
embedded in the program code or entered by the user
when the program runs.
We lack a mechanism for entering data and having
that data persist from one run of the program to the
Python Programming, 4/e
Text Files
Persistent data is a critical component of any modern
computing system.
Your word processor needs to save the paper you’re
working on.
Your programming environment needs to be able to save
and reload your Python code.
Typically, such information is stored in files.
Python Programming, 4/e
Text Files
 is a sequence of data that is stored in secondary
memory (usually on a disk drive of some sort).
Files can contain any data type, but the easiest files to
work with a those that contain text.
Files of text have the advantage that they can be read and
understood by humans, and they are easily created and
edited using general purpose text editors, like IDLE.
Python Programming, 4/e
Multi-line Strings
You can think of a text file as a (possibly long) string
that happens to be stored on disk.
A special character or sequence of characters is used
to mark the end of each line.
While this convention varies by operating system, Python
takes care of these different conventions for us and just
uses the regular newline character (
Python Programming, 4/e
Multi-line Strings
Goodbye 32
When stored to a file, you get this:
Hello\nWorld\n\nGoodbye 32\n
Notice that the blank line becomes a bare newline.
Python Programming, 4/e
Multi-line Strings
This is no different than when we embed newline
characters into output strings to produce multiple lines
of output with a single 
print("Hello\nWorld\n\nGoodbye 32\n")
Remember, if you simply evaluate a string containing
newline characters in the shell, you will just get the
embedded newline representation back.
"Hello\nWorld\n\nGoodbye 32\n"
Python Programming, 4/e
File Processing Outline
Virtually all programming languages share certain
underlying file manipulation concepts.
We need some way to associate a file on disk with an
object in a program – this is called 
 a file.
We need a set of operations that can manipulate the file
At the very least, we need to be able to read the information from
a file and to write new information to a file.
Lastly, when a we are done we need to 
 the file.
Python Programming, 4/e
File Processing Outline
This idea of opening and closing files is closely related
to how you might work with files in an application
program such as IDLE.
When you open a file for editing in IDLE, the file is actually
read from disk and stored in RAM.
At this point, the file is closed (in the programming sense).
As you edit the file, you are really making changes to the
data in memory, not the file itself.
Changes will not show up on disk until you “save” it.
Python Programming, 4/e
File Processing Outline
The process of saving a file in IDLE is also a multi-
step process.
The original file on the disk is opened, this time in a mode
that allows it to store information (opened for 
Doing this actually 
 the old contents of the file!
File writing operations are then used to copy the current
contents of the in-memory file into the new file on disk.
Python Programming, 4/e
File Processing Outline
Working with text files in Python is easy!
Create a file object that corresponds to a file on disk:
<variable> = open(<path>, <mode>)
 is a string that provides the location of the file
on disk.
For a text file, 
 is either "r" or "w" depending on
whether the file intended to be 
 from or 
If the mode is omitted, the file is opened for reading.
Python Programming, 4/e
File Processing Outline
#   Prints a file to the screen.
def main():
    fname = input("Enter a filename: ")
    infile = open(fname, "r")
    data =
Python Programming, 4/e
File Processing Outline
The program first prompts the user for a file name
and then opens the file for reading through the
While any identifier works, here the name serves to remind
us that the object is a file and it is being used for input.
The entire contents of the file is then read as one
multi-line string and stored in the variable 
 causes the file contents to be displayed.
Python Programming, 4/e
File Processing Outline
This process illustrates the basic three-step process
for working with a file:
Open the file.
Use file operations to read or write data.
Close the file.
Any file that is opened should be closed when the
program is done using it. Technically, all files get
closed when the program terminates, but doing it
explicitly is good programming style.
Python Programming, 4/e
File Processing Outline
In order to make sure that necessary actions such as
closing a file occur, Python has a powerful feature
called a 
context manager
#   Prints a file to the screen.
def main():
    fname = input("Enter a filename: ")
    with open(fname, "r") as infile:
        data =
Python Programming, 4/e
File Processing Outline
 statement associates the variable with the
file object created by 
The file object acts as a context manager for
executing the instructions in the indented body of the
When the body has completed, the file will be closed
automatically, even if control leaves the body due to
an exception or 
Python Programming, 4/e
Reading from a File
 is just one of several options that can be used to
access the contents of a file.
 – Returns the entire remaining contents of
the file as a single (potentially large, multi-line) string.
 – Returns the next line of the file, i.e.
all text up to 
and including
 the newline character.
 – Returns a list of the remaining lines
in the file. Each list item is a string of a single line including
the newline character at the end.
Python Programming, 4/e
Reading from a File
Text files are read sequentially – the system keeps
track of what has been read since a file has been
opened, so that a later read will pick up where the
previous one left off.
If you want to read a previous line, you need to close
and reopen the file.
Python Programming, 4/e
Reading from a File
Successive calls to 
 read successive line
from the file.
The string returned by 
 will always end
with a newline character.
Use slicing to strip off the newline character at the
end of the line, otherwise it will look double-spaced.
Or, you could also tell print to not add its own
newline, e.g. 
print(line, end="")
Python Programming, 4/e
Reading from a File
with open(someFile, "r") as infile:
    for _ in range(5):
        line = infile.readline()
Python Programming, 4/e
Reading from a File
One way to loop through the entire contents of a file
is to read in all of the file using 
, then loop
through the resulting list.
with open(someFile, "r") as infile:
    for line in infile.readlines():
        # process the line here
What happens if the file is too large to fit in your
computer’s memory?
Python Programming, 4/e
Reading from a File
Python treats a file as sequence of lines, so looping
through the lines can be done directly:
with open(someFile, "r") as infile:
    for line in infile:
        # process the line here
Python Programming, 4/e
Reading from a File
Let’s improve our statistics library from last chapter.
One disadvantage of the previous version is that
 gets numbers from the user
What if you are trying to average one hundred
numbers and you make a mistake on number 98?
Doh! You’d need to start over again.
Python Programming, 4/e
Reading from a File
A better approach – type all the numbers into a file. We
can then edit the data before sending it to the program.
This file-oriented approach is typically used for data-
processing applications.
We can improve the usefulness of our library by adding a
 function that takes the name of a file
as a parameter and returns a list of numbers read from the
Python Programming, 4/e
Reading from a File
Suppose our numbers are in a text file, with each line
containing a single number.
def getNumbersFromFile(fname):
    nums = []
    with open(fname, "r") as infile:
        for line in infile:
    return nums
Python Programming, 4/e
Reading from a File
We could also do this more succinctly with a list
def getNumbersFromFile(fname):
    nums = []
    with open(fname, "r") as infile:
       nums = [float(line) for line in infile]
    return nums
Python Programming, 4/e
Reading from a File
Using this approach, we need to be very careful with the
format of the input file – there must be 
exactly one
number on each line.
A common error is to introduce an extra blank line at the
bottom that may go unnoticed. This would cause
in <listcomp>
nums = [float(line) for line in infile]
ValueError: could not convert string to float: ’’
Python Programming, 4/e
Reading from a File
We could make our function more flexible by having it
accept multiple numbers on the same line.
A single line can easily be turned into a list of numbers
using split in the list comprehension, similar to what we
did when we had multiple numbers on a single line of
interactive input:
nums = [float(num) for x in line.split()]
Python Programming, 4/e
Reading from a File
To get all the numbers across multiple lines, we simply wrap
this up in an accumulator loop that processes the lines of the
input file:
def getNumbersFromFile(fname):
    nums = []
    with open(fname, "r") as infile:
        for line in infile:
            newnums = [float(num) for x in line.split()]
    return nums
Python Programming, 4/e
Reading from a File
Here the accumulator is called 
 and the list created
from each line is called 
The final line in the loop body appends the numbers from
the current line to the end of the accumulator using the
 extend method introduced in chapter 9.
This version of the stats program appears in
Python Programming, 4/e
Reading from a File
Using this approach has several benefits:
It allows you to create a data file with as many numbers on each
line as you want.
The program will also be more robust by handling accidental
blank lines (Do you see how?).
Python Programming, 4/e
Writing to a File
Opening a file for writing prepares that file to receive data.
If no file with the given name exists, a new file will be
If a file with the given name 
 exist, Python will
delete it and create a new, empty file.
with open("mydata.out", "w") as outfile:
    # do things with outfile here
Python Programming, 4/e
Writing to a File
The easiest way to write information into a text file is to
use the 
To do this, simply add an extra keyword parameter that
specifies the file:
print(..., file=<outputfile>)
This behaves exactly like a normal 
, except the result
is sent to 
 rather than the screen.
Python Programming, 4/e
Writing to a File
Here’s a program to create a text file with a haiku about
def main():
    haiku = ["White space and syntax",
             "Python code flows like water",
             "Solutions emerge"]
    print("I have a haiku for you.")
Python Programming, 4/e
Writing to a File
    fname = input("Enter a file name to receive the haiku: ")
    with open(fname, "w") as haikufile:
        for line in haiku:
            print(line, file=haikufile)
    print(f"Look in {fname} to see your haiku")
Python Programming, 4/e
Batch Processing
To see how these pieces fit together in a larger example,
let’s redo the username generation program from Chapter
Our previous version created usernames interactively by
having the user type in his or her name.
If we were setting up accounts for a large number of users,
this process would probably not be done interactively,  but
 mode, where program input and output is done
through files.
Python Programming, 4/e
Batch Processing
Each line of the input file will contain the first and last
names of a new user separated by one or more spaces.
The program produces an output file containing a line for
each generated username.
Python Programming, 4/e
Batch Processing
# Program to create a file of usernames in batch mode.
def main():
    print("This program creates a file of usernames from a")
    print("file of names.")
    # get the file names
    infileName = input("What file are the names in? ")
    outfileName = input("What file should the usernames go in? ")
Python Programming, 4/e
Batch Processing
    # open the files
with open(infileName, "r") as infile, open(outfileName, "w") as outfile:
    # process each line of the input file
    for line in infile:
        # get the first and last names from line
        first, last = line.split()
        # create the username
        uname = (first[0]+last[:7]).lower()
        # write it to the output file
        print(uname, file=outfile)
print("Usernames have been written to", outfileName)
Python Programming, 4/e
Batch Processing
A couple things worth noticing:
Two files are open at the same time, one for input (
) and
one for output (
). This is accomplished in the 
including two 
open(…) as <variable>
 clauses separated by a
comma. It’s not unusual for a program to act on multiple files
When creating the username, the lower string method was used
to ensure that the username is all lowercase, even if the input
names are mixed case.
Python Programming, 4/e
File Names and Paths
So far in our examples we’ve indicated the file to be
opened by supplying the name of the file as a string.
Using this approach, files end up in the folder where the
programs live.
This might be OK for assignments, but in the real world
we’d like users to be able to select files from anywhere in
secondary memory.
Python Programming, 4/e
Absolute and Relative Paths
Way back in Chapter 1 we looked at how a computer’s
operating system generally organizes secondary memory
as a hierarchical collection of directories (also called
folders) that can contain files as well as other directories.
The directory at the top of this hierarchy is called the root
A file is located by specifying a 
 from the root
directory down through the hierarchy of directories.
Python Programming, 4/e
Absolute and Relative Paths
E.g., the text of this chapter is in a file having the path
The top-level directory on Dr. Zelle’s computer is designated
with a 
. His computer’s root directory contains around 20
subdirectories, including one called 
A slash (
) is also used to separate the directory names along
the path.
Python Programming, 4/e
Absolute and Relative Paths
You can think of the path from the root as representing
the “full name” of any given file.
The name has to be so complex because a typical
computer contains millions of files; there must be a way to
uniquely identify each of these files.
This complete path to a given directory or file is called the
absolute path
Anywhere in Python where a file path is needed, an
absolute path can be used.
Python Programming, 4/e
Absolute and Relative Paths
Anywhere in Python where a file path is needed, an
absolute path can be used.
Working with absolute paths can be a pain!
They’re long
Moving a file or folder changes the absolute paths of files and
Any path that beings with something other than the root
directory is considered a 
Python Programming, 4/e
Absolute and Relative Paths
When we just use the name of a file in our examples,
those were relative paths.
Running programs always have an associated 
 which is the directory that it is currently working
Typically, this is the directory where your program file is
Python Programming, 4/e
Absolute and Relative Paths
Suppose we have a program
 stored in
When this program is run its working directory will be
path = input("What file should I analyze? ")
with open(path, "r") as infile:
    # process the file
If the user enters 
, the program will look for
Python Programming, 4/e
Python Programming, 4/e
Suppose the user instead enters 
Python will threat this as a path starting at the current
working directory: 
The characters “.” and “..” have special meanings for
relative paths.
“.” indicates the current working directory
“..” indicates the parent of the current working directory.
In our previous example, an equivalent would be
Python Programming, 4/e
Absolute and Relative Paths
Dr. Zelle’s laptop is running Linux. While the ideas are the
same, the details differ among operating systems.
On macOS, a user’s home directory is in 
On Windows, the path notation is a little different.
Each hard drive (
) has its own file system with its own root
Windows uses 
 rather than 
 in paths
Python Programming, 4/e
Absolute and Relative Paths
Python always allows paths to be separated using a
regular slash (
) on any OS for interoperability.
It’s best practice to avoid “
” in Windows paths in Python
since the backslash is used in string literals to indicate
special characters, i.e. 
. To use an actual backslash
in a literal, you’d need to escape it (
) or prefix the string
with r to indicate it is a “raw” string (don’t interpret).
Python Programming, 4/e
Absolute and Relative Paths
Three ways to open the same file in Windows
with open("data/nums.txt") as infile:  # generic Python
                                       # notation
with open("data\\nums.txt") as infile: # Windows notation
                                       # using special char
with open(r"data\nums.txt") as infile: # Windows notation
                                       # using raw string
The best one? Number one – it will work on other operating
systems besides Windows.
Python Programming, 4/e
Using pathlib
File are a ubiquitous part of the computing landscape, and
just about every program has to manipulate them in one
way or another.
Python provides a library called 
 to help with some
of the common, but tedious tasks.
The main tool is the 
 is a sort of
“wrapper” around a path string that gives it some
convenient superpowers.
Python Programming, 4/e
Using pathlib
Let’s improve our batch-oriented username program so
that it checks if the intended output file exists. If it does,
create a backup of that file so that the contents aren’t lost
when the new usernames are written.
Python Programming, 4/e
Using pathlib
from pathlib import Path
def main():
   print("This program creates a file of usernames from a")
   print("file of names.")
   # get the file names
   inPath = Path(input("What file are the names in? "))
   outPath = Path(input("What file should the usernames go in? "))
Python Programming, 4/e
Using pathlib
    # backup the output file if it already exists
    if outPath.exists():
        backupPath = outPath.with_suffix(".bak")
        print(f"Renaming existing {} to {}")
Python Programming, 4/e
Using pathlib
    # open the files
    with open(inPath, "r") as infile, open(outPath, "w") as outfile:
       # process each line of the input file
       for line in infile:
          # get the first and last names from line
          first, last = line.split()
          # create the username
          uname = (first[0]+last[:7]).lower()
          # write it to the output file
          print(uname, file=outfile)
print("Usernames have been written to", outPath)
Python Programming, 4/e
Using pathlib
You can extract different parts of a path using simple
attributes from a Path object.
>>> path = Path("/home/zelle/python/data.txt")
>>> path.stem
>>> path.suffix
Python Programming, 4/e
Using pathlib
We can create a slightly modified path by using
 methods to replace specific parts in an
existing path.
backupPath = outPath.with_suffix(".bak")
This creates a new 
 that is just like 
, except it has
the extension (suffix) “.bak” instead of its original extension.
Our program’s output will look something like
Renaming existing usernames.txt to usernames.bak
Python Programming, 4/e
Using pathlib
The actual renaming of the file happens with
The rename method is one of a number of Path object
methods that can be used to make changes in the
underlying file system.
The necessary commands differ by operating system, but
 object handles the differences in a transparent
Python Programming, 4/e
Iterating over Directories
Another task that programs often need to do is to process
a whole batch of files at a time.
For example, a photo management app might allow the
user to load all the images in a given directory.
If you have a 
 object that points to a directory on
your hard disk, there are a couple methods that allow you
to loop over the contents of that directory.
Python Programming, 4/e
Iterating over Directories
The simplest of these methods is 
It produces a sequence of 
 objects, one for each file
or directory contained in the original directory.
>>> path = Path(".")
>>> for p in path.iterdir():
Python Programming, 4/e
Iterating over Directories
[PosixPath(’names.txt’), PosixPath(’test.txt’),
PosixPath(’’), PosixPath(’data’),
PosixPath(’usernames.bak’), PosixPath(’nums2.txt’),
PosixPath(’’), PosixPath(’’)]
Python Programming, 4/e
Iterating over Directories
Notice that each item in the sequence produce by
 is itself a 
It means we can make use of the various 
on these items.
 method returns 
 if the path is a file (as
opposed to a directory).
files = [p for p in path.iterdir() if p.is_file()]
Python Programming, 4/e
Iterating over Directories
If we wanted just the Python program files, we could grab just
the items that had a .py suffix.
python_files = [p for p in path.iterdir() if p.suffix == ".py"]
This last example could have been handled more simply using a
technique known as 
file globbing
You can select a subset of files that match a pattern using the
Python Programming, 4/e
Iterating over Directories
The pattern looks like a regular path string except that it
can contain certain “wildcard” characters.
“?” matches any single character
“*” matches any sequence of characters
python_files = list(path.glob("*.py"))
The glob 
 will match any file that ends with .py
Python Programming, 4/e
Iterating over Directories
Our last addition was a 
function that can be used to get a data set from a specific
Suppose we have a number of data sets, each stored in a
separate file in our data directory.
It would have handy to have a 
function making use of file globbing to accumulate all the
data across the set of files.
Python Programming, 4/e
Iterating over Directories
Let’s write a function with two parameters.
 gives the directory containing the data
 is a pattern for which files to look in
To get the number from all the flies in a data directory, we could
data = getNumbersFromFiles("data", "*")
To get data from all files having “exam” in the name,
data = getNumbersFromFiles("data", "*exam*")
To write this you need an accumulator to build a list of all the
Python Programming, 4/e
Iterating over Directories
def getNumbersFromFiles(basedir, pattern):
   path = Path(basedir)
   nums = []
   for filepath in path.glob(pattern):
      newnums = getNumbersFromFile(filepath)
   return nums
Python Programming, 4/e
Iterating over Directories
Notice how 
 was turned into a 
 object at the
start – that ensures that you can call 
 in the heading.
This function will work when 
 is passed as either a
string or a 
Python Programming, 4/e
File Dialogs
Some operating systems (e.g. Windows and macOS), by
default will only show the main stem of the filename and
not the type suffix, making it hard to know the full
filename for performing file operations.
This situation is even more complicated when the file
exists somewhere other than the current working
directory. In order to operate on these far-flung files, we
need the complete path to them! Do you know how to find
the complete path to an arbitrary file on your computer?
Python Programming, 4/e
File Dialogs
One solution to this problem is to allow users to browse
the file system visually and navigate their way to particular
The usual technique incorporates a dialog box that allows
a user to click around in the file system and either select
or type in th ename of a file.
Fortunately for us, the tkinter GUI library included with
(most) standard Pythons has these kinds of functions!
Python Programming, 4/e
File Dialogs
To ask the user for the name of a file to open, you can use
 function found in the
from tkinter.filedialog import askopenfilename
The reason for the dot notation is that tkinter is package
composed of multiple modules.
To get the name of the user names file
infileName = askopenfilename()
Python Programming, 4/e
File Dialogs
Python Programming, 4/e
File Dialogs
The dialog box allows the user to either type in th ename
of the file or to simply select it with the mouse.
When the user clicks the “Open” button, the complete path
name of the file is returned as a string and saved into the
If the user clicks the “Cancel” button, the function will
simpley return the empty string, "".
Python Programming, 4/e
File Dialogs
from tkinter.filedialog import asksaveasfilename
outfileName = asksaveasfilename()
You could, of course, import both at once:
from tkinter.filedialog import askopenfilename, asksaveasfilename
Python Programming, 4/e
File Dialogs
Python Programming, 4/e
File Dialogs
If you need to get a directory path from the user, there’s
also an 
All these functions have numerous optional parameters
that allow a program to customize the resulting dialogs.
Python Programming, 4/e
Binary Files and Pickling
Files can store any kind of data, even though we’ve
focused on string data so far.
Files on disk are really just a sequence of bytes, so
arbitrary data can be encoded into the bytes stored in a
particular file.
You undoubtedly have files on your computer that store
images, audio, video, etc.
Python Programming, 4/e
Strings and Bytes
There is a close correspondence between characters of a
string and bytes.
Before Unicode, each character in a string was treated as a
single byte of data.
When a string that contains only characters from the
original ASCII alphabet is encoded as bytes, each
character is stored as a single byte.
Python Programming, 4/e
Strings and Bytes
>>> s = "Hello, Bytes!"
>>> b = s.encode()
>>> type(b)
    <class 'bytes’>
Here, we created a string, 
, then encoded it into bytes,
storing it into variable 
Python Programming, 4/e
Strings and Bytes
A byte is 8 bits, which means there are 256 different byte
Typically, bytes are stored as unsigned integers in the
range 0-255, inclusive.
>>> b[0]
>>> b[1]
Python Programming, 4/e
Strings and Bytes
The first byte of 
 is 72, because that is the Unicode value
of “H”. In other words, it is ord(“H”).
>>> len(s)
>>> len(b)
>>> b
    b'Hello, Bytes!'
Python Programming, 4/e
Strings and Bytes
 has 13 characters, 
 has 13 bytes
The last line shows a string literal prefaced with b (for
bytes), which is a compact way of showing the byte
sequence, exploiting the standard ASCII mapping of byte
values to character.
What if our string contains non-ASCII characters?
Let’s concatenate some Unicode characters with values
greater than 255 to our string.
Python Programming, 4/e
Strings and Bytes
sx = s + chr(128) + chr(256) + chr(512) + chr(1024)
bx = sx.encode()
b'Hello, Bytes!\xc2\x80\xc4\x80\xc8\x80\xd0\x80'
Python Programming, 4/e
Strings and Bytes
We added four characters, so the length of the string is
now 17 (characters).
The encoding of the string, though, is now 21 bytes. The
non-ASCII characters were encoded into a 
 of bytes,
and are displayed in hexadecimal (base 16) notation.
We can also convert a bytes object back into a string.
>>> b.decode()
    'Hello, Bytes!'
Python Programming, 4/e
Strings and Bytes
In fact, when we work with a text file in Python, this is
exactly what’s happening behind the scenes!
When reading from a file, Python reads in a sequence of
bytes from the file and decodes them into a string.
To write to a text file, Python encodes the string as a
sequence of bytes and streams the bytes into the file.
Python Programming, 4/e
Binary Mode and Pickling
Python also allows byte-level access to files.
We can read and write data as sequences of bytes rather
than strings.
Let’s assume the haiku we wrote earlier is stored in the file
Python Programming, 4/e
Binary Mode and Pickling
>>> with open("haiku_out.txt", "r") as infile:
       data =
White space and syntax
Python code flows like water
Solutions emerge
Python Programming, 4/e
Binary Mode and Pickling
To treat the file as a sequence of bytes instead of text, we
just append a ‘b’ (for binary) to the mode string when
opening the file.
Notice the difference in our next interaction!
Using the mode ‘rb’ opens the file for reading in binary
Reading the file in this mode gets back a bytes object
instead of a string.
Python Programming, 4/e
Binary Mode and Pickling
>>> with open("haiku_out.txt", "rb") as infile:
    data =
b’White space and syntax\nPython code flows like
water\nSolutions emerge\n’
Python Programming, 4/e
Binary Mode and Pickling
If we want a string back, we must explicitly decode it.
>>> with open("haiku_out.txt", "rb") as infile:
    data =
White space and syntax
Python code flows like water
Solutions emerge
Python Programming, 4/e
Binary Mode and Pickling
We can also open a file for binary writing using the mode
To write to a file in this mode, we must write bytes, not
with open("bytes.out", "wb") as outfile:
    outfile.write(b"Hello, Bytes!")
Notice we didn’t use 
, since 
 turns its
arguments into strings.
Python Programming, 4/e
Binary Mode and Pickling
To output bytes to a file, use the file method 
The binary mode is really for manipulating non-text data.
Doing so requires some sort of binary encoding to
represent the data as a raw sequence of bytes.
Usually, we can use existing libraries that handle whatever
specialized data format we need.
Python Programming, 4/e
Binary Mode and Pickling
One standard library that’s handy for storing binary data is
. The purpose of the library is to preserve your
arbitrary Python objects as a sequence of bytes in a file.
The process of turning an object into a sequence of bytes
is called 
Python Programming, 4/e
Binary Mode and Pickling
Suppose we have created a data set and would like to
save it so that it can be loaded back up again later.
If we quit our program, our list of numbers will be lost
unless we somewhow write it to a file!
We could do this with a text file, e.g.
 (left as an exercise for you).
But what’s the fun of that?
Python Programming, 4/e
Binary Mode and Pickling
Let’s have two functions – one that serializes the list into a
binary file and another that reads it back in again.
import pickle
def storeData(nums, path):
    with open(path, "wb") as outfile:
        pickle.dump(nums, outfile)
Python Programming, 4/e
Binary Mode and Pickling
In this function, 
 is the list of numbers that we want
to save and 
 is the path string (or 
 object) for the
file to save the list into.
Our list is pickled for storage and later consumption with
no loops our futzing around with the 
Python Programming, 4/e
Binary Mode and Pickling
Python uses its own binary format to do the serialization.
To load the list back in again will require another use of
The inverse of 
All we need to do is open up the file for reading in binary
mode and call 
Python will read in the bytes and decode them back into
whatever was pickled in the first place.
Python Programming, 4/e
Binary Mode and Pickling
def loadData(path):
    with open(path, "rb") as infile:
        nums = pickle.load(infile)
    return nums
>>> storeData([3, 1, 4, 1, 5, 9], "test.pkl")
>>> nums = loadData("test.pkl")
>>> nums
[3, 1, 4, 1, 5, 9]
Python Programming, 4/e
Binary Mode and Pickling
You can use pickle to save the state of a game so that
users can pick up where they left off, or your AI
application might serialize a trained neural network so that
you can distribute it to thousands of users.
Python Programming, 4/e
Binary Mode and Pickling
But there are some downsides:
The resulting file is binary and so it is not in a human readable
format. In many cases (like configuration files) it would be a
better idea to keep it human readable.
While pickle works for lots of objects and all Python’s built-in
types, it won’t work for all object types.
The process of loading a pickle file could cause the execution of
arbitrary (and potentially nefarious) Python. Never load a pickle
from an untrusted source!
Python Programming, 4/e
Remote Files
A lot of the data that our programs might use is not stored
on the local computer, but is accessed by the Internet.
Sometimes this is referred to as storing data “in the
The supporting web site for this textbook has all the code
and data file from the book. You can locate those files by
typing the Uniform Resource Locater into your favorite
Python Programming, 4/e
Remote Files
Assuming you have an Internet connection, this will direct
your OS to send a request to another computer asking for
the specified data.
You’ll notice that this looks like a path…
Python Programming, 4/e
Remote Files
You could use your browser to save this data to your
computer, but wouldn’t it be more convenient if we had a
program fetch the data directly off the web for us?
Let’s add one more data fetching function to our statistics
Python provides a function that allows us to open a remote
file in a fashion analogous to opening a file on the local
Python Programming, 4/e
Remote Files
from urllib.request import urlopen
def getNumbersFromURL(url):
    nums = []
    with urlopen(url) as infile:
        for line in infile:
            line = line.decode()
            newnums = [float(x) for x in line.split()]
    return nums
Python Programming, 4/e
Remote Files
There are really only two slight changes from
Instead of using the standard 
 function, it uses 
which is imported from the module 
 function sends out a network request for the given URL and
provides a file-like object from which we can read the data coming back
over the network.
This object acts like a file that has been opened in ‘rb’ mode since the
URL may not point to textual data.
Python Programming, 4/e
Remote Files
After opening the URL, we loop over the resulting data line-by-
line. Since this is binary data, the line is initially a bytes object.
The first line in the loop body decodes it into a string so that we can
then turn the string into a list of number, 
, and accumulate those
numbers into the complete list, 
data = getNumbersFromURL(" ... ")
>>> data
[26.0, 53.0, 5.0, 89.0, 79.0, 32.0, 38.0, 46.0]
