Request Indexing on GSC, Error with Robots.txt

Indexing request rejected

Request Indexing on GSC, Error with Robots.txt

You’ve just finished a majoir project and you are ready to start the process of indexing all of the content. You login to Google Search Console (GSC) and you request the indexing of a URL.

Google URL Inspection Tool

But instead of getting green checkmarks and the results you want, you are seeing the following:

Google Index Request Rejected

Indexing request rejected! During live testing, indexing issues were detected with the URL
Reason: Blocked by robots.txt

Was the Crawl allowed?
No: blocked by robots.txt
Page fetch Failed: Blocked by robots.txt

A robots.txt file is a very simple file that sits within your website’s root directory, the file offers suggestive instructions for all Search Engine Crawlers (that listen), like Google’s bot, as to which files they should and should not crawl.

The first thing you are going to want to check is your robots.txt file, unfortunatley Google search console doens’t have a robots.txt testing platform anymore, and some are speculating that it doesn’t hold that much value anymore as far as SEO goes. Either way we are going to want to switch to the older version so that you can quickly test your robots.txt file.

You can learn more about robots.txt files here:  https://support.google.com/webmasters/answer/6062596?hl=en

If your robots.txt file was indeed blocking the Google bot from accesssing either your webpage or images on your webpage than at least you have found the answer as to why you were getting the Indexing Request Rejected error message.

In order to clean up and avoid other potential crawling errors in the future this would be a good time to reveiw other potential issues your website may be having.

You can view all pages that have the message Indexed, though blocked by robots.txt issues on Google Search Console>>Coverage. If you do not see the warning label, then you don’t have any other issues with this problem.

Are there certain pages on your website that should not be indexed?

There are several reasons why you’d want certain pages to not get indexed. Here are a few:

Robots.txt directives that ‘say’ that a page should not be indexed. Note that you need to allow the page with a ‘noindex’ directive to be crawled so that the search engine bots ‘know’ that it should not be indexed.

In your robots.txt file, make sure that:

The ‘disallow’ line does not immediately follow the ‘user-agent’ line.
There is no more than one ‘user-agent’ block.
Invisible Unicode characters – you need to run your robots.txt file through a text editor which will convert encodings. This will remove any special characters.

Pages that are linked to from other websites. Pages can get indexed if they are linked to from other sites, even if disallowed in robots.txt. In this case, however, only the URL and anchor text appear in search engine results.

One way to resolve the robots.txt blocking issue is by password protecting the file(s) on your server.

Alternatively, delete the pages from robots.txt or use the following meta tag to block

them:

<meta name=”robots” content=”noindex”>

Old URLs

If you have created new content or a new site and used a ‘noindex’ directive in robots.txt to make sure that it does not get indexed, or recently signed up for GSC, there are two options to fix the blocked by robots.txt issue:

  • Give Google time to eventually drop the old URLs from its index
  • 301 redirect the old URLs to the current ones

In the first case, Google ultimately drops URLs from its index if all they do is return 404s (meaning that the pages do not exist). It is not advisable to use plugins to redirect your 404s. The plugins could cause issues that may lead to GSC sending you the ‘blocked by robots.txt’ warning. Or worse penalizing you for something called black hat SEO.

Virtual robots.txt files

There is a possibility of getting notification errors even if you do not have a robots.txt file. This is because CMS (WrodPress, Joomla, ETC..) based sites, for example, WordPress have virtual robots.txt files. Plug-ins may also contain robots.txt files. These could be the ones causing some of the problems on your website.

These virtual robots.txt need to be overwritten by your own robots.txt file. Make sure that your robots.txt includes a directive to allow all search engine bots to crawl your site. This is the only way that they can tell the URLs to index or not.

We recommend allowing google full access to your HTML, Javascript, CSS and image files. This helps prevent Google from not being able to load your page properly.

Here is the directive that allows all bots to crawl your site:

User-agent: *

Disallow: /

the above directive within your robots.txt file means ‘disallow nothing’.

If you are still having issues with your on-page search engine optimization and need help you can contact us or use our software to automatically detect your search engine optimization issues.

 

Tags:
No Comments

Post A Comment