On September 4, 1998, two Stanford University students Larry Page and Sergey Brin created Google. Fast-forward to 16 years later. Today, the term “Google” is associated with finding relevant and reputable information on any topic. An intrinsic part of the modern vocabulary, it is the poster child for search engines, so much so that “Google it” became the default term for finding information online.

As a business owner, you want your product or service to be found online. To be “searchable” and “findable” by Google, you need to understand how Google works. In this post, you will learn the essential factors behind Google’s behaviour, which will help with your online strategy.

Crawl: The discovery process with spiders

In any other context, the terms “crawl” and “spiders” would probably send shivers down your spine. In the context of online search, however, they describe the way Google navigates websites to update its index. In other words, Google sends its “spiders” to “crawl” your website to collect information about it to be added to its massive online database, or index. When a user inputs a search query, Google refers to this database to retrieve relevant information.

There is an estimated 60 trillion webpages in existence. That’s six followed by thirteen zeroes – 60,000,000,000,000. How does Google crawl all those pages? It uses internet bots, called “web spiders”. The web spiders visit a list of URLs (called the seeds) and crawl each hyperlink throughout the website. When a spider detects a link, it follows the link and adds the webpage to the index. If it finds a dead link, with 404 error, it removes it from the index. Although the process is a little bit more complicated than described here, you now have the basic understanding of how Google’s index is constantly being updated.

Index: The database

Each page crawled by web spiders is added to Google’s index. Google’s index is similar to the index typically found at the back of a book. It contains an inventory of the words gathered on the web and their location, e.g., page title tags, meta descriptions, keywords, and other page elements. Google’s algorithms use this database to return the most appropriate page for a search query.

Algorithms: The programs and formulas

When performing a Google search, you are accessing the index of all the pages relevant to that search query. Depending on the query you use, the number of those relevant indexed pages could be in the thousands, if not millions. How does Google decide which pages to display at the top of the search engine result pages (SERPs)? It does it through its algorithm. The algorithms are computer programs that determine which results are most relevant to the user. They calculate the relevancy of a page by looking at over 200 factors, including:

  • PageRank – Measures how important a page is based on the quality of incoming links from other pages.
  • Website Quality – Looks for technical issues and content quality.
  • User context – Finds out if a page contains useful information about the search terms.
    Note: These algorithms are getting quite clever. Check out Google’s Knowledge graph.
  • Freshness – People love newsworthy content, so does Google.
  • Usability –  Understanding how people digest your content and comparing this to other web pages within the index.

Fighting Spam: The repellent

Google is constantly on the lookout for web spam to maintain the integrity of its search results. Most of the spam is removed automatically by algorithms. In some instances, however, a Google employee will personally investigate the suspected case of spam and remove it if required, e.g., if a document’s authenticity is questionable.

Google has taken a strong position towards spam and has penalised websites that do not adhere to its webmaster guidelines.

I hope this article provided you with a little more insight into Google’s inner workings.

Conclusion

In this post you have learned the process Google uses to decide whether to display your pages in its search engine result pages (SERPs):

  1. It crawls your website using spiders.
  2. It indexes your website in its database.
  3. It refers to this database to respond to a search query.
  4. It uses special algorithms to decide which page is most deserving of being displayed in the SERPs.