Crawling is where it all begins. The acquisition of data about a website. This involves scanning the site and getting a complete list of everything on there – the page title, images, keywords it contains, and any other pages it links to
An automated bot – a spider – visits each page, just like you or I would, only very quickly. Even in the earliest days, Google reported that they were reading a few hundred pages a second.
indexing is the process of taking all of that data you have from a crawl, and placing it in a big database. Imagine trying to a make a list of all the books you own, their author and the number of pages. Going through each book is the crawl and writing the list is the index.
Any site that is linked to from another site already indexed, or any site that manually asked to be indexed, will eventually be crawled – some sites more frequently than others and some to a greater depth.
The last step is what you see – you type in a search query, and the search engine attempts to display the most relevant documents it finds that match your query. This is the most complicated step, but also the most relevant to you. It is also the area in which search engines differentiate themselves.
The ranking algorithm checks your search query against billions of pages to determine how relevant each one is. This operation is so complex that companies closely guard their own ranking algorithms as secrets.