3-11 Searching and Sorting

We know that an algorithm is a step-by-step problem-solving procedure. Two of the most common problems solved by computer algorithms are (1) searching, and (2) sorting data. In today’s lesson, we’ll study the simplest search and sort algorithms in computer science.

The Linear Search

One of the benefits of having a set of related data in a list is that it is very easy to search the list for the existence of a particular value.  Earlier you saw how the in operator can be used to determine whether an item is in a list or not.  Sometimes you need to know not only whether an item is in a list, but at what index.  When we search a list from the first element (i.e., index 0) to the last element it is called a linear search in computer science (sometimes also called a sequential search).

Here is a sample program to demonstrate a linear search:

This program creates a list with 7 integer values and asks the user what number they’d like to search for in the list (search_num). The linear search algorithm is pretty simple. It uses a single for loop to compare each element in the list with the value being searched for. If there is a match, the index of the matching element is stored in found_index and the loop exits using a break statement. Using the linear search algorithm one can search a list for the existence of a particular value, even lists of strings and floats.

Note what will happen if there are duplicate values in the list (e.g., 12 in our example). Here the algorithm will find the index of the first occurrence and then stop searching due to the break statement on line 10.

The Bubble Sort

To sort data means to put it in order, either alphabetic or numeric. The most basic sorting algorithm is called the bubble sort. Here is a pseudo-code description of how the bubble sort works:

Here is an illustrated example of this algorithm in action:

After the first pass, we notice that the largest value (5) has “bubbled” its way to the end of the list; however, the list is still not in order. So, we must continue to repeat this process until we have completed the required # of passes (i.e., list length – 1). Only then will the list be in order. On each subsequent pass, the next largest value will “bubble” its way to its correct position near the end of the list.

Before presenting code for the bubble sort we need to first understand how to swap the contents of two variables. This procedure is at the heart of all sorting algorithms. Suppose we wish to interchange the value of integers a and b and that their original values are:

When finished with the swap, the value in a should be 79 and the value in b should be 5. This is accomplished with the following code where you will notice the presence of a variable called temp, which serves as a safe haven for the first variable while the swap is being made.

Below is Python code to demonstrate the bubble sort in action. Notice that we require two nested for loops.  The outer loop controls the # of passes through the array, and the inner loop controls the comparisons and swapping during each pass. Also notice that we are using the swapping technique explained above, except list elements are being used.

The linear search and bubble sort are the slowest and most inefficient of all the searching and sorting algorithms. They should really only be used if you have relatively small lists to search or sort (say, 50 items or less). If you had, for example, 1,000,000 items to search or sort, these techniques are dreadfully slow compared to other, more sophisticated, algorithms.

So, why learn these if they are so slow?  Of all the searching and sorting algorithms, these are the simplest to understand and enough for our purposes this year.  In ICS4U we’ll get into more advanced techniques – hope to see you next year!

The Easy Way!

Ok, here’s the full truth about searching and sorting lists in Python.  As you might imagine, because searching and sorting are such common problems in programming, Python comes with built-in methods to efficiently search and sort lists.

The index() method is useful for searching a list.  You pass an argument to the index() method and it returns the index of the first element in the list containing the item.  If the item is not found in the list, the method raises a ValueError exception.

Notice that I first check if the person we are looking for is in our list of heroes before attempting to get its index.  This is one way of avoiding a ValueError exception from occurring if the person is not found.  Alternatively, you could use exception handling to deal with this, but I think this is a more elegant approach.

The sort() method re-arranges the elements of a list so they appear in ascending order (from the lowest to the highest value).  The reverse() list method can be used to reorder a sorted list in descending order instead if desired.  For example:

Easy eh?  Most modern languages include functions like these to make your life easier, but as a budding computer scientist, it’s important to understand how these algorithms actually work and not just how to use them.

You Try!

    1. Start a new page in your Learning Journal titled “3-11 Searching and Sorting“.  Carefully read the notes above and in your own words summarize the key ideas from each section.
    2. Consider the Linear Search example code again.  How would the algorithm behave differently if we removed the break statement on line 10?  Does the break statement improve the efficiency of the algorithm? If so, explain how.
    3. Hacker Level Question: As mentioned, the Bubble Sort is known for its simplicity, but it’s not a very efficient sorting algorithm. As explained, the outer loop will iterate one time less than the total number of items to be sorted. That is, if you have a list of 10 items, the outer loop will repeat exactly 9 times. However, it’s possible that a list can become sorted after less than 9 passes of the outer loop. In fact, if only one number is out of place the list will be sorted after just one iteration of the outer loop, yet the algorithm we are using will repeat 8 more iterations of the outer loop!
      Using this knowledge, we can improve the performance of the Bubble Sort. To do this, you create a Boolean flag variable to keep track of whether a swap occurred in the inner loop. If the inner loop finishes and no swap was made, then you know that the list is sorted and the outer loop can be terminated. Modify the code from today’s lesson to demonstrate how this would work.