Chapter 14 - String operations and formatting¶
1.1 String methods¶
In the last chapter we learned how to use the bracket notation to access the individual characters of a string and to get a substring, or “slice”, from a string. In this chapter we want to begin exploring some of the other operations on strings that are available in the Python libraries.
Let’s start with an example. There is an operation called “upper” that takes a string and gives you a new string that is the same as the first one, but with all the lower-case characters converted to upper-case. Suppose we have the string “Steve” stored in a variable name
.
>>> name = "Steve"
Then we execute the statement:
>>> t = name.upper()
The variable “t” now contains the value “STEVE” in capital letters.
Notice that the string ``name`` is not changed.
Look at the peculiar notation we have used:
name.upper()
What’s with the dot? You might recognize this notation from your experience with turtles. When we want a turtle to turn left 90 degrees, we can’t just say left(90)
. We have to specify which turtle should turn left 90 degrees, e.g., alex.left(90)
. Here it is the same thing. We want the upper
function to operate on the particular string name
.
A function used this way to operate on a particular object is called a method. (In contrast, when we call a built-in function like len
, we put the string in the parentheses as an argument: len(s)
Most of the operations you can perform on strings are implemented as methods and are invoked using this “dot” notation. We sometimes refer to the string before the dot as the target of the method, because that’s the string the method is going to act upon.
The other thing to notice about this example is that the original string “name” is not modified. In fact, none of the string operations actually modify the target string - strings are immutable, remember?
Here is an overview of the most useful methods on strings. We have just seen the method upper()
. There is also a method lower()
that returns the target string with any upper-case letters converted to lower-case.
The method capitalize()
returns a string with just the first letter converted to uppercase and the rest in lowercase:
The method strip()
returns a new string with all “whitespace” trimmed from the beginning and end of your target string. Remember that “whitespace” includes space characters, tab characters, and newline characters.
These four methods, upper(), lower(), capitalize(), and strip(),
all return a new string. The next example is a method that returns an integer. The find()
method has one argument, which is a character or string, say t
. If t
is a substring of your target string, then the find()
method returns the first index at which t
occurs. If t
doesn?t occur at all, then find()
returns -1.
The startswith()
method is also related to substrings. Like find()
, it has one string argument t
, but it returns a boolean value. It returns True if the very beginning of your target string exactly matches t
.
Logically, you could say that
s.startswith(t)
is true if and only if
s.find(t) == 0
It will probably not surprise you to learn that there is a similar function endswith()
that returns True if the end of the target exactly matches the argument.
We’ll conclude this summary with one more method, called split()
, that is very useful for processing text. The return value of split()
is not a single value, but is a list of strings. The idea is that you take a string containing multiple items with whitespace between them, like words in a sentence:
s = "There are eels in my hovercraft!"
Then the result of invoking split() is a list of the individual words:
As we saw in the last chapter, you can use the bracket notation to get the individual elements of the sequence.
1.2. Summary of string methods¶
The box below contains a summary of the methods we have discussed. Note that there are many more which we have not discussed. You can see them all in section 5.6.1 of the Python Library Reference, available at
<http://docs.python.org/library/stdtypes.html#string-methods>
1.3. Chaining methods¶
Here is an example. In some of our interactive scripts we ask a user to enter “y” for yes and “n” for no, and then check whether the response is equal to “y”.
status = input("Are you an Iowa resident(y/n)? ") if status == "y": # add sales tax or whatever...
This works fine if the user is careful to enter exactly the single character, lower-case “y”. But what if they enter an upper-case “Y”, or if they type “yes” or “Yes”, or accidentally add an extra space before or after the response. We’d probably want to count all those responses as a “yes” answer too. So we can make our code more robust by stripping off the whitespace from the response, using the strip() method.
status = input("Are you an Iowa resident(y/n)? ") status2 = status.strip()
Then, we can convert to lowercase:
status3 = status2.lower()
Finally, we check whether the response starts with “y”:
if status3.startswith("y"):
The thing to notice is that since the value of status.strip()
is just a string, we can invoke the lower()
method on it directly, without the need for the variable status2.
status3 = status.strip().lower() if status3.startswith("y"):
But of course, status.strip().lower()
is also a string, and so we can invoke the startswith()
method directly on it.
if status.strip().lower().startswith("y"):
1.4. An example using the find()
method¶
Let’s write a function whose argument is a string of the form ‘last name, first name,’ and returns a string of the form ‘first name last name’. Assume that there may or may not be spaces after the comma. For instance, if the argument is “Python, Monty” the return value should be “Monty Python”. The function definition might start out like this:
def switch_name(name):
First we need to find the comma.
i = name.find(",")
The last name is the substring before the comma.
last = name[ :i]
The first name is everything after the comma, not including the position of the comma itself. We can strip off any extra spaces by applying the strip()
method:
first = name[i + 1: ].strip()
Finally, we put the first and last names together with a space in between.
1.5. Format strings¶
In our programs so far we have ignored the issue of how to format a number so it prints with exactly two decimal places, or how to line things up in columns. For these and related problems we need format strings
. The idea is just to create a string containing “placeholders”, and then supply the values to fill in. The placeholders can be used to help format the output the way you want it. The placeholders are called format specifiers
and always start with the “%” symbol. These are the most common forms:
%d
placeholder for an int%f
placeholder for a float%s
placeholder for a string
Try this:
Well, that’s not very interesting. As you can see, fmt
is just a string. In order to replace the placeholders with something meaningful, you follow the format string with another “%” symbol, and then a list of values in parentheses. These values are substituted for the placeholders. The result expression is a string that you can print or assign to a variable:
So as you might imagine the number of values you supply in the parentheses has to match the number of placeholders, and the values have to be the right type. (If there is only one value, the parentheses are optional.)
The amount of space taken up by a value is called the field width. You can specify the field width by putting a number directly after the “%” in the format specifier. The field automatically expands to fit the value you supply, so a field width of 1 just means “take up the minimum amount of space”, which is the default. For example, here we are allocating 10 spaces for the number 42.
For floats you can also specify the number of decimal places; for example %6.3f means “allow a total of 6 spaces and display 3 decimal places”. A specifier “%1.2f” will round the number to two decimal places and print it with the minimum amount of space:
This one will allocate 6 spaces for the number, with 3 after the decimal:
By repeating specifiers you can get things lined up in columns. Here we are printing everything in columns 12 characters wide.
Notice that the default is to print the value “right-justified” - that is, at the right side of its column. You can force the value to be “left-justified” by putting a minus sign in front of the field width:
One more useful trick: When you put the field width in a format specifier, you can put a zero first. This means that any extra spaces in the field will be filled with zeros: