Exercise 3

Problem 1

The crawler we built may be very unrealistic when supplied with a random seed page. Modify it to limit the number of pages it crawls. The max_pages should be given as input to the crawl_web, and it should stop the crawling when max_pages are crawled.

Note

You may need to implement some error handling for crawler to work with any random seed page

Problem 2

The crawler we built may be very unrealistic when supplied with a random seed page. Modify it to limit the depth of links from the seed page it crawls. The max_depth should be given as input to the crawl_web, and it should stop the crawling when max_depth is reached.

Note

Suppose pages are linked as: Seed -> A -> B -> C. Now seed has depth 0, A has depth 1, B has depth 2, and C has depth 3.

Problem 3

You inspected the url printed by your code as an exercise for Unit1 and also saved it to a CSV file. Later, you wrote a procedure inspect_url to take care of the matters. Now since you have extracted all the links from a webpage, modify your procedure(s) (you may define a procedure to save data in a CSV file) to save all the urls in CSV file.

Note

Ignore any other possibilities that come along. You are free to make your assumptions where neeeded!

Problem 4

Given your date of birth and the current date, calculate the exact number of days you have been alive.

Problem 5

Define a procedure greatest that takes as input a list of positive integers and outputs the greatest integer. For example:

greatest([1,2,3]) # -> 3
greatest([9,3,6,4,5,2,99,25,36]) # -> 99
greatest([7,8,7,7,8,7,8,7,8,7,8,7,8]) # -> 8

Problem 6

Given a list of sub-lists where each sub-list contains the name of university, number of students enrolled, and fee per student. You need to define a procedure enrollment_fee that finds the total number of students in all universities and also the total amount of fee collected by all universities.

enrollment_fee([['uni1',50,100],['uni2',100,120]]) # -> 150, 17000

Best!