How is Google finding and indexing pages that I don't want indexed?
by Staff

How is Google finding and indexing pages that I don't want indexed?

I've got a project in WebCeo tools that I'm also tracking on Alexa. On the Alexa tool it's showing a huge number of pages with issues, but when I click through I see it's picking up thousands of pages that shouldn't be indexed and many with duplicate titles.

When I checked, I got:

Index of /wp-content/plugins/elementor

Here's an example:

http://domain.com/wp-content/plugins/elementor/

I'm wondering if somehow these pages are being indexed which are part of the Wordpress elementor CMS and should not be indexed?

Answer:

This happens a lot with spiders, including Google.

When they see a link for example, to

/wp-content/plugins/elementor/style.css,

they'll start crawling back up the directory path to discover

/wp-content/plugins/elementor/

which they will attempt to open and, if successful, will find an index of all the files in that directory.

Many servers allow this by default. To avoid getting pages you don't want indexed from being crawled on the Apache server you should add the following line to the .htaccess file... (if not Apache, ask your web host)

Options -Indexes

...this will generate a 403 (Forbidden) error when attempting to access that URL.

Ideally you'd do this before these pages get indexed. Getting them delisted requires using Search Console to remove them and then blocking them using Robots.txt.

To see everything that Google has indexed, go to Search Console and review the indexed, not submitted report.

How is Google finding and indexing pages that I don't want indexed? by Staff

TO READ THE FULL ARTICLE

How is Google finding and indexing pages that I don't want indexed?
by Staff