What's the proper way to stop Google from indexing old or useless content?
 by Kristi Hagen

What's the proper way to stop Google from indexing old or useless content?

  • In my effort to get rid of old / useless content, or at least stop Google from indexing it, I changed the robots.txt file to disallow some areas as shown below:
    :User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Noindex: /author/
    Noindex: /tag/
    Disallow: /author/
    Disallow: /tag/
    Disallow: /news/
    Disallow: /category/
    Disallow: /testimonials/
    Disallow: /testimonials-widget-category/
    Disallow: /testimonials-widget-post_tag/
    

    We are now getting messages from GSC regarding 'Coverage Issues' which say: Indexed, though blocked by robots.txt.

    Is this something to worry about and is there any way around it?

Answer:

This is a great question and a common mistake that SEOs make. Google even came out and put a BIG warning against doing this on their Block search indexing with 'noindex' support page:

Important! For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.

So, the issue with using robots.txt for no indexing for files (such as /tag/) is that anything that was already indexed will get "stuck" in the index. Google won't spider and add more, but since they can't scan the URLs, they don't drop them out either.

Typically the best solution on WordPress is to use the meta noindex tag on these (like /tag/). If you're using the Yoast SEO plugin you can use it to tell Search Engines not to index those directories sections of your site. That will generate a meta noindex tag on just those pages.

Be sure to remove the blocks you added in robots.txt as it's necessary for the above to work.SEN article end

...

TO READ THE FULL ARTICLE