Exercise 3
Problem 1
The crawler we built may be very unrealistic when supplied with a random seed page. Modify it to limit the number of pages it crawls. The max_pages
should be given as input to the crawl_web
, and it should stop the crawling when max_pages
are crawled.
Note
You may need to implement some error handling for crawler to work with any random seed page
Problem 2
The crawler we built may be very unrealistic when supplied with a random seed page. Modify it to limit the depth of links from the seed page it crawls. The max_depth
should be given as input to the crawl_web
, and it should stop the crawling when max_depth
is reached.
Note
Suppose pages are linked as: Seed -> A -> B -> C. Now seed has depth 0, A has depth 1, B has depth 2, and C has depth 3.
Problem 3
You inspected the url
printed by your code as an exercise for Unit1 and also saved it to a CSV file. Later, you wrote a procedure inspect_url
to take care of the matters. Now since you have extracted all the links from a webpage, modify your procedure(s) (you may define a procedure to save data in a CSV file) to save all the urls in CSV file.
Note
Ignore any other possibilities that come along. You are free to make your assumptions where neeeded!
Problem 4
Given your date of birth and the current date, calculate the exact number of days you have been alive.
Problem 5
Define a procedure greatest
that takes as input a list of positive integers and outputs the greatest integer. For example:
greatest([1,2,3]) # -> 3
greatest([9,3,6,4,5,2,99,25,36]) # -> 99
greatest([7,8,7,7,8,7,8,7,8,7,8,7,8]) # -> 8
Problem 6
Given a list of sub-lists where each sub-list contains the name of university, number of students enrolled, and fee per student. You need to define a procedure enrollment_fee
that finds the total number of students in all universities and also the total amount of fee collected by all universities.
enrollment_fee([['uni1',50,100],['uni2',100,120]]) # -> 150, 17000
Best!