Robots.txt Primer: Get Your Pages Indexed Faster by Controlling Google's Spider
 by John Heard

Robots.txt Primer: Get Your Pages Indexed Faster by Controlling Google's Spider

— By John Heard & Stephen Mahaney

One of the most critical SEO tasks is to control the search engine spiders (like Googlebot) that crawl and index your Web site. Mastery of these spiders is paramount to preventing duplicate content while ensuring that search engines focus mainly on your most important pages.

Spider? Bot? Crawler?
The terms spider, crawler, bot and robot all generally refer to the same thing. Technically, a bot is any program that downloads pages off the web, while a spider is a bot that the search engines use to build their index. But you'll often hear one being used to refer to the other, and the distinction isn't especially important.

Although it may seem a bit technical, spider control is actually easier than most people think. It's simply a matter of deploying an essential tool called the robots.txt file. Robots.txt gives spiders (aka, robots) the direction they need to find your most important pages. This file ensures a spider's time on your site will be spent efficiently—and not be wasted by indexing obscure pages (think - Privacy Policy, About us, cgi pages, etc.) that are either problematic to spiders or unessential to searchers who are seeking your products or services.

Controlling Search Spiders with Robots.txt

Picture your robots.txt file as the tour guide to your site for the search engines. It provides a map that tells search engines where to find the content you want indexe...

TO READ THE FULL ARTICLE





Related Articles & Guides