'Fetch as Googlebot' Should NOT be Ignored


There are so many tools online these days that you can't go a full day without hearing of at least one more. However, there is one that you should keep in your back pocket in case things go wrong. Fetch as Googlebot allows you to look at your site from Google's point of view literally. This tool crawls your site and returns with all the information it finds, helping you find crawl errors.

Information returned by the tool includes:

  • The HTTP response returned by your server
  • The date and time of your crawl request
  • HTML code
  • The first 100KB of visible (indexable) text on a page. If there is no content, it may indicate that your page is generated entirely from JavaScript or rich media files, not text-based content. You should review this text to make sure that it doesn't include unexpected content, which could indicate that your site has been hacked. (Note: Googlebot may crawl more than the first 100KB of text.)

You can then take the time to be sure that what it sees is what you intended and in some cases - the same as what you're seeing. One great example of this is if you have a site that is hacked. Here's a snippet from one of Matt Cutt's recent blog posts.

"...recently a well-known musician's website was hacked. The management firm for the musician wrote in to say that the site was clean now. Here's the reply I sent back:

Unfortunately when our engineers checked this morning, the site was still hacked. I know the page looks clean to you, but when we send Googlebot to fetch www.[domain].com this morning, we see

<title>Generic synthroid bad you :: Canadian Pharmacy</title>

on the page. What the hackers are doing is sneaky but unfortunately pretty common. When you surf directly to the website, you see normal content. But when a search engine (or a visitor from a search engine) visits the website, they see hacked drug-related content. The reason that the hackers do it this way is so that the hacked content is harder to find/remove and so that hacked content stays up longer.

The fix in this case is to go deeper to clean the hack out of your system. See http://support.google.com/webmasters/bin/answer.py?hl=en&answer=163634 for some tips on how to do this, but every website is different.

One important tool Google provides to help in ass...

TO READ THE FULL ARTICLE





Related Articles & Guides