Lab 11 - Text Files and Exam Practice¶
Part 1: Reading Text files¶
Lines of a file¶
Text files are almost always organized into lines of text. A line of text is just any sequence of characters ending with a special character (sometimes a combination of characters) called a newline.
We have seen that any character is really just a number. A character set defines which numbers correspond to which characters. There are many character sets in use in the world, but most of them start out the same way: the first 128 characters are defined by the ASCII character set used in the US.
Some of the character codes aren’t actually visible characters; rather, they are used to control aspects of printing. The control characters ASCII 13 and ASCII 10 are used to indicate the end of a line of text. The exact combination will vary with your operating system (OS X and Linux files usually use ASCII 10, Windows files usually use ASCII 13 followed by ASCII 10), but we normally don’t care about the details, since the Python operations for reading text files will handle either form.
Some of the control characters you’ll encounter frequently have special representations that you can type as part of a Python string.
'\t'
- the tab character (ASCII 9)'\n'
- the newline character (ASCII 10)'\r'
- the carriage return character (ASCII 13)
Those two-character pairs starting with the backslash are called escape sequences. You might wonder: what if I really just want to print the backslash character? Easy: just use two backslashes:
Basic file processing¶
Almost all processing of text files happens in a loop like this:
open the file for each line in the file do something with the line close the file
The part that says “do something” can be simple or complicated.
For example, suppose we have a file testfile.txt containing the text below:
My name is Jack and I live in the back of the Greta Garbo home for Wayward Boys and Girls And the line above is blank!
Here is a loop that just prints out the lines, followed by the length of each line.
(Note that open("jack.txt")
is the same as open("jack.txt", "r")
, that is,
we are opening the file for reading.)
Each line of a file ends with a newline - that’s why it looks like we have an extra blank line after each print. We often have to strip off the newline
before processing the line.
“Opening” and “closing” the file mean that the Python interpreter negotiates with the operating system to get access to the disk drive or other hardware resource.
Path to a file¶
When you run the examples above, we are cheating a little bit, because you don’t have to worry about where the file is located: it’s been built into this web page.
More realistically, we have to think about how the Python interpreter is going to find the file you are trying to read, and what to do if, in fact, the file isn’t there. You can see below the error that occurs if the file doesn’t exist (in this example, we misspelled its name).
The working directory
Whenever your code runs, the interpreter has some notion of a working directory for your program. (Remember that the terms “directory” and “folder” are used interchangeably.) By default, it’s just the directory containing the .py
file for the module you are trying to run. When that module wants to import another module that you have written, it looks in the current working directory.
The same thing is true when you want to open a file: the interpreter looks in the current working directory for a file with the given name. So one simple solution when you want to open a file is to copy it into the directory containing your Python module.
You can get the interpreter’s idea of your current working directory using the function os.getcwd()
. Create a short Python program in Wing 101 containing the following code and run it.
import os print(os.getcwd())
The function getcwd()
returns a string representing the current working directory for the program. This string will have a different form depending on your operating system. If you are running Windows on one of the lab computers, you’ll see a string that looks something like:
U:\cs127\labstuff
An expression like the one above is called an absolute path. It describes how to start at the “root” or base of the filesystem and find your way through the directories to get to a particular directory or file. Windows uses a letter like “U:” to represent the root or base directory of the filesystem on a device, and uses backslashes to separate directories in the path. The string above says “Start at the root of the device ‘U:’, go into the cs127 directory, and then to into the labstuff directory.”
The details will be different, depending on how you have your own files and folders organized.
If you are using your a Mac, you’ll see a string that looks something like
/Users/username/Documents/cs127/labstuff
OS X uses a single forward slash “/” as the root directory and separates path elements with a forward slash.
Opening a file in the working directory
Start by downloading the sample file
and save it into the directory where you have your Python code for this lab.
Now, try reading the test file. Again create a new Python program with the code below.
f = open("testfile.txt") for line in f: print(line) f.close()
If you can now open the file from your Python program, you’ve passed the major hurdle in file processing: actually finding the file!
Opening a file that is not in the working directory
Copying a file into the working directory for your program is not always convenient or possible. Fortunately you can also find a file using an absolute or relative path.
For this example, first use File Explorer or Finder to go into your working directory where you have your Python code for this lab. In it, create a subdirectory called, say, “foo”. Download the sample file
and save it in the directory foo. Run the following code:
f = open("testfile2.txt") for line in f: print(line) f.close()
You should get an error, because there is no file testfile2.txt in your working directory.
One solution is to use a relative path: describe how to get to the file from the current working directory. In this case you could say “first go down into the directory foo, and then find the file testfile2.txt.” This is represented by the string
foo/testfile2.txt
Try this:
f = open("foo/testfile2.txt") for line in f: print(line) f.close()
(Aside: on Windows, you can use either backslashes or forward slashes when you describe a path with a Python string. But if you use a backslash in a string, remember you actually have to put two backslashes, like this: “foo\\testfile2.txt”.)
A file can also be found via its absolute path. Look at the output you got from os.getcwd()
above. Use that sequence of directories to describe how to get to the file starting from the root. For example, if your working directory looks like
“U:\cs127\labstuff”, use the code,
f = open("U:/cs127/labstuff/foo/testfile2.txt") for line in f: print(line) f.close()
You can think of it like this. If you want to explain to someone how to get to your friend’s house, you can do it in one of two ways. You could say “from my house, go two blocks over and take a left”. That’s a relative path. Or, you could pick a fixed reference point and give complete directions from there, like “starting at the highway 30 exit from I-35, follow highway 30 to University blvd...” and so on. That would be an absolute path (where we consider the Highway 30 exit to be the “root” for finding anything in Ames.)
There is more discussion on this topic here:
Checkpoint 2¶
Show the TA that you have the file testfile2.txt in a subdirectory foo and that you can open it using a relative or absolute path.
Part 2: Exam Review¶
The purpose of this section is to provide some practice in preparing for the exam. You will need a piece of paper and a pencil. You have to show what you’ve done to the TA by the end of the period. You will not be penalized if you don’t finish during the lab period.
You can download the problems here:
http://web.cs.iastate.edu/~smkautz/cs127f16/labs/lab11/problems.pdf