Lab 10 - List operations and algorithms

Part 1: List methods

Lists and strings are very similar. In both cases we can access individual elements using an index, and we can get a substring or sublist using slices. We can concatenate two lists together using the + operator.

The biggest difference is that, unlike strings, lists are mutable. You can use an index to assign a new value to an element of a list, you can add new elements to a list, and you can use the del keyword to remove an element.

You can find a summary of the most common list methods here http://interactivepython.org/runestone/static/thinkcspy/Lists/ListMethods.html

In addition to the methods, there are some keywords and operators that work with lists. We have seen del and + above. You can use the in keyword to determine whether an element occurs in a list.

And you can use the * operator to create a list of multiple copies of a given list. For example, here is an easy way to create a list containing 10 zeros:

However, the method you’ll use the most is append. As an example, recall that in class we saw how to read a bunch of numbers entered at the console and store them as a list. (That list could then be sorted to determine the median value, among other things.)

Checkpoint 1

  1. Write a function that, given a list of numbers, returns a new list with all the same numbers, but without duplicates. The original list is not modified. For example, given the list [2, 5, 4, 5, 6, 5, 4, 3], the function returns the list [2, 5, 4, 6]. (As you build the list to return, what’s an easy way to check whether a number is already in the list?)

Part 2: Pencil and paper are a programmer’s best friend

When we are programming simple things, we can often get away with just writing some code and tinkering with it until it works. When writing loops involving lists and strings, what we are doing can be complex enough that we have to be a bit more careful to figure out exactly what we want our code to do, and think of ways to check.

We have seen some problem-solving strategies already:

  • Write down concrete examples and do them by hand.
  • Identify a related, simpler problem and solve it first.
  • As you write code, think of ways to verify that it is correct, such as using codelens or printing out partial results.

Programmers use pseudocode to describe the basic steps of an algorithm, without getting into the details of variable names or correct Python syntax. What is pseudocode? Imagine you are explaining to your baby sister how to perform the steps of your solution. You should be able to explain the steps in a simple and direct enough way that your little sister can carry them out from your instructions, without understanding the problem you are trying to solve.

As an example, let’s work through solving the following problem: Is one list a sublist of another list? This is like a substring, but for lists. For example, using a list of integers, we might have:

haystack = [3, 2, 4, 3, 1, 2, 3]
needle = [3, 1, 2]

We would say that needle is a sublist of haystack, because there is a way to line them up so every element of needle is equal to the corresponding element of haystack

[3, 2, 4, 3, 1, 2, 3]
         [3, 1, 2]

On the other hand, what if

haystack = [3, 2, 4, 1, 3, 2, 3]
needle = [3, 2, 1]

We can try all possible ways to line them up by shifting the needle over to the right:

[3, 2, 4, 1, 3, 2, 3]
[3, 2, 1]

[3, 2, 4, 1, 3, 2, 3]
   [3, 2, 1]

[3, 2, 4, 1, 3, 2, 3]
      [3, 2, 1]

[3, 2, 4, 1, 3, 2, 3]
         [3, 2, 1]

[3, 2, 4, 1, 3, 2, 3]
            [3, 2, 1]

But no matter what, we can’t find a way to line them up so that all the corresponding elements match. So in this case needle is not a sublist of haystack.

This is looking complicated! Let’s try a simpler problem. What if we only had to check the first case? That is, we only worry about lining the two lists up at the beginning.

[3, 2, 4, 1, 3, 2, 3]
[3, 2, 1]

What do we have to do?

Check whether needle[0] is equal to haystack[0], ok, keep going.

Check whether needle[1] is equal to haystack[1], ok, keep going.

Check whether needle[2] is equal to haystack[2], nope, so the answer is no.

Describe this in pseudocode as

for each index in needle
    if the two elements at that index are different
        answer is False

And if we make it all the way through this loop, the answer must be true. Notice this is a search problem. We are “searching” for a pair of elements that don’t match.

What would be different if they are lined up some other way? Say we shift needle over by 4:

[3, 2, 4, 1, 3, 2, 3]
            [3, 2, 1]

Check whether needle[0] is equal to haystack[4], ok, keep going.

Check whether needle[1] is equal to haystack[5], ok, keep going.

Check whether needle[2] is equal to haystack[6], nope, so the answer is no.

Again in pseudocode,

for each index in needle
    if the needle[index] is different from haystack[index + shift]
        answer is False

Looks like we just have to add a amount to the haystack’s index to account for the amount shifted. Let’s try writing a real Python function that does just this much.

What’s needed to complete the algorithm? We just have to perform the steps above for each possible shift amount. If we find a shift amout for which everything matches, then we know the function returns True. If we don’t find one, the function returns False. (Another search problem!)

for each possible shift amount
    if everything matches using that shift (*)
        return True
return False

Checkpoint 2

1. What are the possible shift amounts for the two example lists n and h above? We can shift by 0, 1, 2, 3, or 4. We can’t shift any farther, because 4 plus the length of n would go past the end of h. What are the possible shift amounts if the needle has length 4 and the haystack has length 9? Write a formula for the maximum shift amount based on the lengths of the two lists.

  1. Finish implementing the is_sublist function below based on the pseudocode above that determines whether needle is a sublist of haystack. Use the function all_match_with_shift to implement the line marked (*) in the pseudocode above.

Checkpoint 3

This checkpoint is an exercise with pencil and paper. For the two problems below, do several examples by hand as shown, and write down the algorithm in pseudocode in a way that makes sense to you. Show it to the TA. She or he will try to follow the steps you’ve written.

Problem 1.

Arithmetic expressions sometimes have a lot of parentheses. Sometimes it is hard to tell whether you always have the correct number of left ones and right ones. We want an algorithm that will tell us whether the parentheses in an expression are balanced, that is, every right parentheses has a matching left one, which is somewhere on its left. For example, in

(()())()

everything is fine, but in

())()(()

that third character doesn’t have a matching left parenthesis, even though there is an equal number of left ones and right ones in the string. What is the problem? We read partway through the string, and when we get to the third character, there are too many rights to match the lefts that have been read so far.

Likewise

(((()))

has a different kind of problem.

Write down several examples like this on a piece of paper. Make sure to try some where they are balanced and some where they are not. Write two numbers under each parenthesis, working from the left: the number of lefts encountered so far, and the number of rights encountered so far. For example,

            (  (  )  (  )  )  (  )
lefts:      1  2  2  3  3  3  4  4
rights:     0  0  1  1  2  3  3  4

When you do this for a string with unbalanced parentheses, what do you notice about these numbers? Based on what you observe in your examples, write an algorithm in pseudocode that will determine whether the parentheses in a given string are balanced.

Problem 2.

Define a run in a list of numbers to be a group of consecutive numbers that are equal. We want to write an algorithm to determine the length of the longest run. For example, in the list

[5, 4, 4, 2, 3, 4, 4, 4, 2, 2, 3, 4]

the longest run is 3, consisting of the three 4’s at index 5. To begin, write out a list like the one above on paper. As you read from left to right, under each element write down two values

  • a count: how many of this value have I seen in the current group?
  • a max: what is the largest value of the count I have seen?

If the element is the same as the one before it, add one to the count, otherwise, restart the count at 1. For example, the list above would start out like this:

5            4            4           2           ...
count 1      count 1      count 2     count 1
max 1        max 1        max 2       max 2

When you reach the end of the list, the value of ‘max’ should be the length of the longest run you’ve found.

Write the steps of this algorithm in pseudocode.

Checkpoint 4

Write Python functions based on your pseudocode for the two problems from checkpoint 2. Test each function using the examples you wrote out on paper.