Skip to content

Procedures and Control

In Unit 1, you wrote a program to extract the first link from a web page. The next step towards building your search engine is to extract all of the links from a web page. In order to write a program to extract all of the links, you need to know these two key concepts:

  1. Procedures - a way to package code so it can be reused with different inputs.
  2. Control - a way to have the computer execute different instructions depending on the data (instead of just executing instructions one after the other).

Recall this code from the end of Unit1:

page = '<...contents from some web page>'
start_link = page.find('<a href=')
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
print(url)

This finds and prints the first link on the page. To keep going, we could update the value of page to be the characters from the end_quote, and repeat the same code again:

page = page[end_quote:]
start_link = page.find('<a href=')
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
print(url)

page = page[end_quote:]
start_link = page.find('<a href=')
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
print(url)

This code will print out the next two links on the web page. Clearly, this is tedious work. The reason we have computers is to avoid having to do tedious, mechanical work! In addition to being tedious, repeating the same code over and over again like this won’t work well because some pages only have a few links while other pages will have more links than the number of repetitions.

Procedural Abstraction

Procedural abstraction is a way to write code once that works on any number of different data values. By turning our code into a procedure, we can use that code over and over again with different inputs to get different behaviors.

Procedures

A procedure takes in inputs, does some processing, and produces outputs. For example, the + operator is a procedure where the inputs are two numbers and the output is the sum of those two numbers. The + operator looks a little different from the procedures we will define since it is built-in to Python with a special operator syntax. In this unit you will learn how to write and use your own procedures.

Here is the Python grammar for writing a procedure:

def <name>(<parameters>):
  <block>
  • The keyword def is short for "define".
  • <name> is the name of a procedure. Just like the name of a variable, it can be any string that starts with a letter and followed by letters, number and underscores.
  • <parameters> are the inputs to the procedure. A parameter is a list of zero or more names separated by commas: <name>, <name>,.... Remember that when you name your parameters, it is more beneficial to use descriptive names that remind you of what they mean. Procedures can have any number of inputs. If there are no inputs, the parameter list is an empty set of closed parentheses: ().
  • After the parameter list, there is a : (colon) to end the definition header.
  • The body of the procedure is a <block>, which is the code that implements the procedure. The block is indented inside the definition. Proper indentation tells the interpreter when it has reached the end of the procedure definition. Let’s consider how to turn the code for finding the first link into a get_next_target procedure that finds the next link target in the page contents. Here is the original code:
start_link = page.find('<a href=')
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]

Next, to make this a procedure, we need to determine what the inputs and outputs are.

Quiz 2.1

What are the inputs for the procedure, get_next_target?

  • A) a number giving position of start of link
  • B) a number giving position of start of next quote
  • C) a string giving contents of the rest of the web page
  • D) a number giving position of start of link and a string giving page contents
Answer
  • C) a string giving contents of the rest of the web page.

Quiz 2.2

What should the outputs be for get_next_target?

  • A) a string giving the value of the next target url
  • B) url, page
  • C) url, end_quote
  • D) url, start_link
Answer
  • C) url, end_quote

return Statement

To make the get_next_target procedure, we first add a procedure header and indent our existing code in a block:

def get_next_target(page):
  start_link = page.find('<a href=')
  start_quote = page.find('"', start_link)
  end_quote = page.find('"', start_quote + 1)
  url = page[start_quote + 1:end_quote]

To finish the procedure, we need to produce the outputs. To do this, introduce a new Python statement called return. The syntax for return is:

return [<expression>, <expression>, ...]

A return statement can have any number of expressions. The values of these expressions are the outputs of the procedure. A return statement can also have no expressions at all, which means the procedure procedures no output. This may seem silly, but in fact it is quite useful. Often, we want to define procedures for their side-effects, not just for their outputs. Side-effects are visible, such as the printing done by a print statement, but are not the outputs of the procedure.

Quiz 2.3

Complete the get_next_target procedure by filling in the return statement that produces the desired outputs.

def get_next_target(s):
  start_link = s.find('<a href=')
  start_quote = s.find('"', start_link)
  end_quote = s.find('"', start_quote + 1)
  url = s[start_quote + 1:end_quote]
  return ________
Answer
return url, end_quote

Using Procedures

In order to use a procedure, you need the name of the procedure, followed by a left parenthesis, a list of the procedure’s inputs (sometimes called operands or arguments), closed by right parenthesis:

<procedure>(<input>, <input>, ...)

For example, consider the rest_of_string procedure defined as:

def rest_of_string(s):
    return s[1:]
 ```

To use this procedure we need to pass in one input, corresponding to the parameter s:

```python
print(rest_of_string('dsquare')) # square

We can see what is going on by adding a print statement in the procedure body:

def rest_of_string(s):
  print('Here I am in rest_of_string!')
  return s[1:]

Call the procedure:

print(rest_of_string('dsquare'))

The output looks like:

Here I am rest_of_string!
square

You can do anything you want with the result of a procedure, for example you can store it in a variable.

Calling the above rest_of_string procedure as:

s = rest_of_string('dsquare')
print(s)

results in:

Here I am rest_of_string!
square

Think of procedures as mapping inputs to outputs. This is similar to a mathematical function. Indeed, many people call procedures in Python like the ones we are defining functions.

Quiz 2.4

What does the inc procedure defined below do?

def inc(n):
  return n + 1
  • A) Nothing
  • B) Takes a number as input, and outputs that number plus one
  • C) Takes a number as input, and outputs the same number
  • D) Takes two numbers as inputs, and outputs their sum
Answer
  • B) Takes a number as input, and outputs that number plus one

Quiz 2.5

What does the sum procedure defined below do?

def sum(a, b):
  a = a + b
  • A) Nothing
  • B) Takes two numbers as its inputs, and outputs their sum
  • C) Takes two strings as its inputs, and outputs the concatenation of the two strings
  • D) Takes two numbers as its inputs, and changes the value of the first input to be the sum of the two number
Answer
  • A) Nothing

However, see what happens on executing: print(a)

Traceback (most recent all last):
File "<pyshell#11>", line 1, in <module>
print(a)
NameError: name 'a' is not defined

An error is returned because the variable a is not defined outside the block of the procedure.

Quiz 2.6

What does the sum procedure defined below do?

def sum(a,b):
  a = a + b
  return a
  • A) Takes two numbers as its inputs, and outputs their sum.
  • B) Takes two strings as its inputs, and outputs the concatenation of the two strings.
  • C) Takes two numbers as its inputs, and changes the value of the first input to be the sum of the two number.
Answer
  • B) Takes two strings as its inputs, and outputs the concatenation of the two strings.
  • C) Takes two numbers as its inputs, and changes the value of the first input to be the sum of the two number.

Quiz 2.7

Define a procedure, square, that takes one number as its input, and outputs the square of that number (result of multiplying the number by itself). For example,

print(square(5))    # -> 25
Answer
def square(n):
  return n * n

Also try:

print(square(square(5)))

This is an example of procedure composition. We compose procedures by using the outputs of one procedure as the inputs of the next procedure. In this case, we use the output of square(x)as the next input to square. Connecting procedures using composition is a very powerful idea. Most of the work of programs is done by composing procedures.

Quiz 2.8

Define a procedure, sum3, that takes three inputs, and outputs the sum of the three input numbers.

print(sum3(1, 2, 3)) # -> 6
Answer
def sum3 (a, b, c):
  return a + b + c

Quiz 2.9

Define a procedure, abbaize, that takes two strings as its input, and outputs a string that is the first input followed by two repetitions of the second input, followed by the first input.

abbaize('a', 'b') # -> 'abba'
abbaize('dog', 'cat') # -> 'dogcatcatdog'
Answer
def abbaize(a, b)
  return a + b + b + a

Quiz 2.10

Define a procedure, find_second, that takes two strings as its inputs: a search string and a target string. It should output a number located at the second occurrence of the target string within the search string. Example:

danton = "De l'audace, encore de l'audace, toujours de l'audace."
print(find_second(danton, 'audace'))  # -> 25
Answer
def find_second(search, target):
  first = search.find(target)
  second = search.find(target, first + 1)
  return second

You could eliminate the variable second:

def find_second(search, target):
  first = search.find(target)
  return search.find(target, first + 1)

You could even reduce this to one line by eliminating the variable first:

def find_second(search, target):
  return search.find(target, search.find(target) + 1)
Back to top