Python
This training aims to teach everyone the basics of programming computers using Python. We cover the basics of how one constructs a program from a series of simple instructions in Python. If you are an experienced programmer in any programming language (whatever it may be), you can pick up Python very quickly.
As part of this training we will be building a basic search engine.
Basic Terminologies
Before jumping into writing some actual program, let's first talk about some basic terminologies:
- A computer is a machine that can execute a program. With the right program, a computer can do any mechanical computation you can imagine.
- A program describes a very precise sequence of steps. Since the computer is just a machine, the program must give the steps in a way that can be executed mechanically. That is, the program can be followed without any thought.
- A programming language is a language designed for producing computer programs. A good programming language makes it easy for humans to read and write programs that can be executed by a computer.
- Python is a programming language. The programs that we write in the Python language will be the input to the Python interpreter, which is a program that runs on the computer. The Python interpreter reads our programs and executes them by following the rules of the Python language.
Programming Language
Why do we need to invent and learn new languages, like Python, to program computers, rather than using natural languages like English or Mandarin?
There are many reasons why a designed language like Python is better for writing programs than a natural language like English. One problem with natural languages is that they are ambiguous. Hence, not everyone will interpret the same phrase the same way. To program computers, it is important that we know exactly what our programs mean, and that the computer will run them with the meaning we intended. Another problem with natural language is that they are very verbose. To say something with the level of precision needed for a computer to be able to follow it mechanically would require an awful lot of writing. We want our programs to be short so it is less work to write them, and so that it is easier to read and understand them.
Grammar
Compared to a natural language, like English, programming languages like Python adhere to a strict grammatical structure. In English, even if a phrase is written or spoken incorrectly, it can still be understood with the help of context or other cues. On the other hand, in a programming language like Python, the code must match the language grammar exactly. The Python interpreter has no idea what to do with input that is not in the Python language, so it produces an error.
Web crawler
A web crawler is a program that collects content from the web. A web crawler finds web pages by starting from a seed page and following links to find other pages, and following links from the other pages it finds, and continuing to follow links until it has found many web pages.
Here is the process that a web crawler follows:
- Start from one preselected page. (We call the starting page the "seed" page)
- Extract all the links on that page. (This is the part we will work on in this unit and Unit 2)
- Follow each of those links to find new pages.
- Extract all the links from all of the new pages found.
- Follow each of those links to find new pages.
- Extract all the links from all of the new pages found.
- ...
This keeps going as long as there are new pages to find, or until it is stopped. In unit 1 we will be writing a program to extract the first link from a given web page. In Unit 2, we will figure out how to extract all the links on a web page. In Unit 3, we will figure out how to keep the crawl going over many pages. In Unit 4 we will build a search index.
This training is taken from Udacity CS101 course.