String Basics
Python supports a special form of programming called Object-Oriented Programming (OOP). This kind of programming describes various things as objects. Later we’ll build objects as the primary actors in your video games.
A sequence is an object that contains a series of related data items. Python supports three basic sequence types: strings, lists, and tuples. We’ve already worked with strings, and you know that a string is simply a sequence of characters. We used lists when we learned about the for statement. Lists and Tuples are sequences that can hold various types of data. We’ll cover strings today, lists and tuples in future lessons.
At this point, we’ve used strings in a fairly limited way, but there are many interesting operations that we can perform with strings. Because a string is a sequence of characters, we can iterate over a string using a for loop:
1 2 3 4 |
name = "Mr. Rao" for ch in name: print("The ASCII Value Of:",ch,"is",ord(ch)) |
This program will iterate over each character in the name string variable, displaying each character on a separate line. The built-in ord() function returns the ASCII value of its character argument, and a related chr() function takes an ASCII value as a parameter and returns the character.
It’s also possible to access individual characters of a string using what is called an index. Each string character has an index that specifies its position in the string. Indexing starts at zero (0).
Here’s some code to demonstrate:
1 2 |
name = "Mr. Rao" print(name[0], name[2], name[6]) |
would display “M . o”. You can also use negative numbers as indexes:
1 2 |
name = "Mr. Rao" print(name[-1], name[-3], name[-7]) |
This would display “o R M”. Negative indexes are relative to the end of the string. Think of the index [-1] as meaning “the first, last character”, and index [-2] as “the second, last character” and so on… An IndexError exception will occur if you try to access an index that is out of range for a particular string. For example, this code will crash because there are only 7 characters in “blaster” (indexes 0 through 6). When index is assigned the value of 7, the attempt to print the character at that index will fail.
1 2 3 4 5 6 7 |
game = "blaster" index = 0 while index < 10: print (game[index]) index += 1 |
You could use exception handling to deal with this, but it’s more of a logic error than a run-time error. It’s better to make your code more dynamic so that it can handle strings of varying lengths. To do this we can use the built-in len() function which returns the number of characters in a string (not the index of the last character).
1 2 3 4 5 6 7 |
game = "blaster" index = 0 while index < len(game): print (game[index]) index += 1 |
Now, no matter how many characters are referenced by the game string variable, the loop will work. Notice that the loop repeats as long as index is less than the length of the string. This is because the index of the last character of the string will always be 1 less than the length of the string (since indexing starts at 0). It’s easy to make indexing logic errors if you are not careful.
Strings Are Immutable
In Python, strings are immutable objects, which means that once they are created, they cannot be changed. This may seem puzzling since operations like string concatenation give the impression that they modify strings, but in reality, they do not. Consider the following:
1 |
name = "Mr." |
This assignment statement creates a string variable that references “Mr.” in memory like this:
Now, watch what happens if we append to this string:
1 |
name = name + " Rao" |
As you can see, the original string “Mr.” is not modified. Instead, a new string containing “Mr. Rao” is created and assigned to the name variable. The original string “Mr.” is no longer accessible because no variable references it. The Python interpreter will eventually remove the unusable string from memory.
Need more proof? If you try to modify a specific character in a string you’ll get an error. Try this:
1 2 |
name = "Mr. Rao" name[4] = 'W' |
This will raise an exception because it is trying to change the value of an immutable object (the name string).
This is an important point: If you go back to U1-6 when we talked about reassigning a value to an integer variable (high_score), you’ll see that int variables are also immutable! It turns out that so are float and bool variables too. Any time you assign a new value to an int, str, float, or bool variable it refers to a new value in memory. The old value is deleted unless some other int, str, float, or bool variable refers it to.
String Slicing
A slice is a span of items that are taken from a sequence. When you take a slice from a string, you get a span of characters from within the string. String slices are also called substrings. String slicing works a little like the range() function.
For example, this code prints the characters from index 6 up to (but not including 10) on the screen (i.e., “e Py”):
1 2 |
message = "We Love Python!" print(message[6:10]) |
If you leave out the index before the colon, Python assumes you want to start at index 0. And, if you leave out the index after the colon, Python uses the length of the string as the end index.
1 2 3 4 5 |
message = "We Love Python!" print(message[:7]) # displays "We Love" print(message[8:]) # displays "Python!" print(message[:]) # displays "We Love Python!" |
The last example is equivalent to this:
1 |
print(message[0: len(message)]) |
and returns a complete copy of the string.
As with the range() function, you can specify a step value when slicing. For example:
1 2 |
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" print(letters[0:26:2]) # displays "ACEGIKMOQSUWY" |
Finally, you can use negative indexes to refer to positions relative to the end of the string. For example:
1 2 |
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" print(letters[-5:]) # displays "VWXYZ |
An invalid index does not cause a slicing expression to raise an exception. For example:
-
- If the end index specifies a position beyond the end of the string, Python will use the length of the string instead.
- If the start index specifies a position before the beginning of the string, Python will use 0 instead.
- If the start index is greater than the end index, the slicing expression will return the empty string ”.
Here is an example program you can use to play around with string slicing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
word = "pizza" print(\ """ Slicing 'Cheat Sheet' 0 1 2 3 4 +---+---+---+---+---+ | p | i | z | z | a | +---+---+---+---+---+ -5 -4 -3 -2 -1 """) print("Enter the beginning and ending index for your slice of 'pizza'.") print("Press the [Enter] key at 'Begin:' to exit.") begin = None while begin != "": begin = input("\nBegin: ") if begin: begin = int(begin) end = int(input("End: ")) print("word[", begin, ":", end, "]\t\t", end="") print(word[begin:end]) |
The in Operator
You can use the in operator to determine if a string is contained within another string.
1 2 3 4 5 6 |
message = "We love Python!" if "love" in message: print("The string 'love' was found.") else: print("The string 'love' was not found.") |
Because the in operator yields a Boolean result (True or False), you can combine it with the logical operator not as well; i.e., not in.
1 2 |
students = "Ben Lisa John Zoe Isabella Alex" print("Bill" not in students) # displays True |
Here’s a more detailed example that prompts the user to enter a message, and then creates a new string based on the message but with all of the vowels stripped out.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
message = input("Enter a message: ") new_message = "" VOWELS = "aeiou" print for letter in message: if letter.lower() not in VOWELS: new_message += letter print("A new string has been created:", new_message) print("\nYour message without vowels is:", new_message) |
You can see by the name of the VOWELS variable, that it’s intended to be used as a constant value in the program, and not changed. As discussed before, this naming practice for constants is considered good style, but there is nothing in Python that will stop you (or another programmer) from changing it in the program. So, once you create a variable with a name in all caps, make sure that you treat it as unchangeable!
You Try!
-
- Start a new page in your Learning Journal titled “3-4 Strings“. Carefully read the notes above and in your own words summarize the key ideas from each section.
- If a string has 10 characters, what is the index of the last character?
- What exception does this code cause? Briefly explain why.
12animal = "Tiger"animal[0] = 'L' - Try to figure out what the following code will display, then check your hypothesis using the Python interpreter.
12345678910my_string = "abcdefg"print(my_string[2:5])print(my_string[3:])print(my_string[:3])print(my_string[:-3])print(my_string[-3:])print(my_string[:])print(my_string[7:])print(my_string[4:1])