2019 Robots.txt Primer: Get Your Site Properly Indexed by Controlling Google's Spider
 by John Heard

2019 Robots.txt Primer Get Your Site Properly Indexed by Controlling Google's Spider

— By John Heard & Stephen Mahaney

advanced image optimizationOne of the most critical SEO tasks is to control the search engine spiders (like Googlebot) that crawl and index your website. Mastery of these spiders is paramount to preventing duplicate content while ensuring that search engines focus mainly on your most important pages.

Although it may seem a bit technical, spider control is actually easier than most people think. It's simply a matter of deploying an essential tool called the robots.txt file. Robots.txt gives spiders aka, robots) the instructions they need to understand how to crawl your website.

Spider? Bot? Crawler?
The terms spider, crawler, bot and robot all generally refer to the same thing. Technically, a bot is any program that downloads pages off the web, while a spider is a bot that the search engines use to build their index. But you'll often hear one being used to refer to the other, and the distinction isn't especially important.

This file ensures a spider's time on your site will be spent efficiently—and not be wasted by indexing obscure pages such as:

  • On-site Search Result Pages
  • PHP, Perl and other Scripts
  • Shopping Cart Checkout
  • Advertising Landing Pages
  • Password Protected Directories
  • Forum Member Pages
  • "Print" Versions of Pages
In other words, URLs that are either problematic to spiders or that don't belong in the search results.

Controlling Search Spiders with Robots.txt

Picture your robots.txt file as the tour guide. It provides a map that tells search engines where...


Related Articles & Guides