3-5 Searching Algorithms

You’ve seen that the Arrays class and ArrayList collection include handy methods to search for the index of an item within a 1D array or ArrayList object, respectively.  As Java programmers it’s important to know how to use these Java API methods. However, as computer scientists, it’s even more important that we understand how to write our own search methods from scratch.  That’s what we’ll tackle today!

Linear Search

A linear search works as its name implies, it searches for an item (called a key value) starting from the first element in a data structure and proceeds linearly (i.e., in order) through all the elements until the key value is either found or we reach the last element.  Linear search is the simplest, and most brute force, searching algorithm there is.  Here is a class to demonstrate:

The code above defines a LinearSearch class that encapsulates an array of random integer values (line 6) called data.  The constructor allows outside code to specify how many random elements should be in the array.  Lines 20 to 27 implement the linear search algorithm using an iterative (for loop) approach. This method takes a searchKey as a parameter and returns the index of the first occurrence of that key in the array, or -1 if a match is not found. Finally, the toString() method uses the static toString() method of the Arrays class to display the contents of the encapsulated data array.  Ignore the recursiveLinearSearch() methods for now, more on these in the exercises today.

Going back to the constructor, notice the use of the Random class (imported from java.util).  So far we’ve been using Math.random() to generate floating-point values and then simple arithmetic and casting to produce random integers within a particular range.  The Random class provides convenient methods to generate any random primitive data type; for example, nextInt(), nextDouble(), and nextBoolean() – please consult the Java API for full details.

The nextInt() method call on line 15 returns a random integer between 0 (inclusive) and the parameter 90 (exclusive). By adding 10 to this I’m effectively shifting the range of random values from 0 to 89 to 10 to 99.   This is just a design decision I’ve made to demonstrate how to shift the range.

Here is a test class to demonstrate the LinearSearch class:

[23, 17, 51, 45, 19, 74, 28, 28, 87, 42]
Please enter an integer search key (-1 to quit): 51
Integer 51 was found at index 2.
Please enter an integer search key (-1 to quit): 99
Integer 99 was not found.
Please enter an integer value (-1 to quit): -1

Binary Search

When a data set is in a random order the linear search is the only way to search for a key.  However, when data is in sorted order there is a much more efficient algorithm we can use called the binary search.

The binary search works by using a “divide and conquer” approach:

  1. Define two indexes, low = 0, and high = index of the last element of the data.
  2. Calculate the index of the middle element = (low + high) / 2.
  3. If the middle element matches the key, return the middle
  4. If the key is greater than the middle element, set low = middle + 1.
  5. If the key is less than the middle element, set high = middle – 1.
  6. If the low index becomes greater than the high index, return -1 because the search has failed to find a match.
  7. Go back to step 2.

Let’s say we are looking for the key value 10 in an array of 10 integers:

By changing the low and high indexes you can see that the binary search methodically cuts the number of elements to be searched in half.  For very large arrays this can make a huge difference over linear search.  Note that the calculation for middle uses integer division ( / ), so the remainder is simply truncated (not rounded).

In the case where the search key does not exist (for example, if the key was 11), we would have a situation where the low index exceeds the high index like so:

Let’s look at a BinarySearch class:

Because binary search requires sorted data to work, line 18 calls the Arrays.sort() method to sort the random integers.  The rest of the BinarySearch class is structured similarly to the LinearSearch class we discussed earlier. The searchProgress() helper method is used to display the range of elements in the array that are included in the current stage of the search with an asterisk (*) under the current middle element.

Remember when we talked about recursive methods I told you that some algorithms are naturally recursive?  Binary search is one of them so I’ve chosen to implement it using a recursive method!  You’ll note that there are two methods called recursiveBinarySearch().  The public one is actually not recursive at all. It allows outside code to start a binary search without having to pass in the index of the first element (0) – after all, this is an internal detail that a user should not have to worry about.  The public method calls the private one, which actually performs the binary search recursively and returns the index of the searchKey if found, or -1 otherwise.

There are two base cases: (lines 34 to 36) if the low index > high index then the key was not found, and (line 44 to 46) if the key == the middle element then the key was found.  There are two recursive cases as well, one that repeats the binary search on elements above the current middle element, and one that searches elements below it.

The BinarySearchTest class is nearly identical to LinearSearchTest:

Try the code above and experiment with different search keys.  Sometimes there can be duplicate data values in the search array — which one does binary search find?

Which Algorithm is Better?

It depends! The linear and binary search both produce the same result but they do so in very different ways.  Which algorithm is the “best” depends on several factors.  The advantages of the linear search are that it is very simple to code (and read!) and does not require the data being searched to be in order, which is often the case in the real world.  On the other hand, if the data is sorted and especially if there are many elements, the superior performance of the binary search is clear.

For example, let’s say we have an array of n = 1,000,000 elements.  Using linear search, if the key exists in the array, we will find it after ~500,000 comparisons (i.e., n/2) on average.  With binary search the key will be matched in ~20 comparisons (i.e., log2n) on average because half of the elements can be eliminated from the search on each pass.  Next time we’ll get into the details of how to compare the efficiencies of different algorithms.

You Try!

  1. Let’s say we have an ordered (sorted) array of many elements, but some of the elements repeat.  Using a linear search algorithm, if we searched for an element that appeared more than once in the array which duplicate value would be found first?  Now, consider the same question using a binary search.  Would it always find the first of the duplicate elements?  Explain.  Try to verify your hypothesis using the BinarySearchTest program.
  2. You’ll note in the LinearSearch class that I included an empty private recursiveLinearSearch() Complete the code for this method and revise the LinearSearchTest class to demonstrate that your method works.  Hint: There will be 2 base cases and one recursive case in your code.  Remove line 43: return -1; This was only included as a placeholder so that the code would compile.
  3. You’ll note in the BinarySearch class that I included an empty iterativeBinarySearch().  Complete the code for this method and revise the BinarySearchTest class to demonstrate that your method works.  Remove line 64: return -1; This was only included as a placeholder so that the code would compile.