Chapter 13 - Strings, brackets, and lists

1.1. The bracket notation

We have seen that two strings can be put together, or “concatenated”, into a single string by using the “+” operator. For example, we can take the three strings, “Monty”, “Python”, and the space character, and concatenate them together into a new string.

But many times we need to do the opposite. That is, we need to take apart a string to get to its parts, for example:

  • If we started with the string “Monty Python” how would we extract the just last name “Python”?
  • From a string such as “$4.25” how do we extract just the number 4.25 ?
  • If we have a date in the form “04/15/11”, how would we rewrite it as “April 15, 2011”

Our first step in seeing how to do these things is to learn how to access the individual characters of a string using the bracket notation. You can think of a string as a sequence of characters. Each character has an index, or position, within the string. The left-most character in the string has index 0. We can get the character at a given index by writing the index in brackets. Here, s[0] is the first character of a string names s.

Remember that we have a built-in function len() that tells us the total number of characters in a string. Since we’re starting at zero, the index of the last character is always one less than the length.

You can abbreviate s[len(s) - 1] as s[-1], and in fact Python also allows you to keep counting backwards from the end of the string using negative numbers:

What happens if you try to use an invalid index? It’s an error, since there is no character there.

This generates a new type of error, an IndexError. We have to get used to the fact that we start counting from zero!

1.2 Substrings

It isn’t really very often that you want a single character from a string. Usually you want a whole section out of the string. For instance from the string “Monty Python” we might just want the first name, “Monty”. That’s an example of a substring. You can describe a substring by giving the range of indices for the section of the string you want. In this case, we want to start at index zero, and include the characters up to, but not including, index 5.

_static/monty_substring.PNG

We can get a substring by putting that range of numbers into the bracket notation.

Using the bracket notation like this is often called slicing. You can also take a substring out of the middle. For example, what if we wanted just the substring “ty Py”?

_static/monty_substring_1.PNG

That would be the characters at indices 3 through 7. So the range we put into the brackets is 3 to 8, since the upper bound is never included.

What if we want to extract just the name “Python”? We could count and notice that we want characters at indices 6 to 12:

Or, we could notice that we just want to start at 6 and include characters up to the length of the string:

But this is such a common situation that there is a shorthand for it. If you leave off the number after the colon, it means everything up to the end of the string.

As you might imagine there is a similar shorthand for starting at the beginning of the string. If you leave off the number before the colon, it means starting at index zero.

To summarize: The notation s[x:y] selects a substring (or slice) of s starting at index x and including characters up to, but not including, index y.

_static/s_substring.PNG

It is worth noticing that if the range doesn’t really describe part of the string, it isn’t an error - you just get an empty string as the result.

Here’s another observation: suppose s is any string and x is any number.

What is s[ :x] + s[x: ]?

It’s always the same as the original string s.

As an example, let’s write a function that takes any string and replaces its first letter with the letter “b”. For instance, “see” would become “bee”.

It is tempting to try it like this, where we just assign a new value to the first character.

But when we try to use it, we get an error!

word[0] = "b" TypeError: 'str' object does not support item assignment

It turns out that once a string is created, it can’t be modified. We say that strings are immutable. What we have to do is just create the result we want as a brand new string. We take a substring of the given word that doesn’t include the first letter, and then concatenate a “b” at the beginning:

1.3 Other kinds of sequences

A string is a kind of a sequence. The individual elements of the sequence are characters. Think about some of the properties of a string we have just been using:

  1. It has a length, which we can determine with the len() function
  2. Each element has a position or index in the sequence, 0 through the length - 1.
  3. We can use the bracket notation to access an individual element according to its index, or to take a “slice” using a range of indices

There are sequences of other types too, that also have those same three properties. In Python, a list is a sequence of values of any type. You can create a list just by writing the values, separated by commas, and enclosed in square brackets. For example, here is a simple list of five numbers:

If you examine a list in the shell or using a print statement, it is displayed in the same way; the values are separated by commas and the whole thing is within square brackets.

We can use the len function to determine its length, and the bracket notation to examine its elements:

We can also take a “slice”:

You can also create lists of other types. Here is an example of a list of strings.

ducks = ["Huey", "Louie", "Dewey"]

Now, let’s put together what we’ve seen so far to do something new. We’ll write a function that converts a date such as “04/15/11” into the form “April 15, 2011”. We’re assuming here that the month, day, and year are always exactly two digits.

We can use slicing to get the two digits corresponding to the day, which occupy indices 3 and 4.

day = date[3:5]

And we can do the same thing with the year, which is the last two digits, using a negative index of -2 to indicate the second to last character.

year = date[-2:]

And we can get the month number the same way. But how do we convert it to the name of the month? Well, we could use an if statement with a lot of elifs. But here is another idea. Let’s create a list of strings that has the month names in order:

names = ["January", "February", "March", "April", "May", "June",

"July", "August", "September", "October", "November", "December"]

Now we know that names[0] is January, names[1] is February, and so on. So we just have to convert the month number into an index we can use in this list. When we write dates, we use numbers 1 through 12, so we’ll need to subtract 1 because the list indices go from 0 through 11. Remember to convert from string to int first.

month = date[0:2] index = int(month) - 1 month_name = names[index]

Finally we have to put the month name, the day, and the year together into a new string

result = month_name + " " + day + ", " + "20" + year

The completed function looks like this: