02 Jul 2018
How to Remove the Web Page From Google Index?
One of the primary factors affecting the SEO success is to effectively use the concept of “Crawl Budget” in our website. Therefore, all the methods mentioned in this article aim for two common purposes;
- Those who want to remove a specific page from Google
- Those who consider SEO performance and want to effectively use the crawl budget by ensuring that unworthy/unnecessary pages are not found in the index.
How to remove the web page from the index?
The first thing to do is to identify which page or page groups to remove from Google index. Identification is the first step and it is indeed pretty important. Because if we apply the developments mentioned later on to a SEO valuable page or page group, we might encounter an undesired traffic loss.
Google Analytics tool will be an important reference and guide to us for this step. When we go into the traffic data of our website of last one year, if there are any page groups not causing any traffic, it will not be a problem to remove these pages from the index and block crawling.
In general, these unnecessary pages are “tag” and “author” pages for blog sites. User pages like “cart”, “sign up”, “log in” and “filter” pages for e-commerce sites.
As we mentioned in the video, the first thing we need to do is to find a common pattern for those pages. The pattern might be a common parameter in the URL or a common subfolder. For example;
https://www.example.com/en/men-shorts?filter=66
https://www.example.com/en/women-shorts?filter=blue
Common pattern for the URL’s above is the “?filter=” parameter.
https://www.example.com/blog/tag/italian-foods
https://www.example.com/blog/tag/italian-nights
Common pattern for the URL’s above is “/tag/” subfolder.
Next step to remove the pages from the index is to check if they are in the index. To check if a specific page or a page group which we detected the pattern are in the index, we need to perform search inquiries on Google:
site:https://www.example.com/blog/tag/italian-foods
or
site:https://www.example.com/blog inurl:/tag/
Another example:
site:https://www.example.com/en/women-shorts?filter=blue
or
site:https://www.example.com/en inurl:?filter=
As you can see on the live example above, filter pages are almost 9k indexed on Google. Besides, there is scarcely any traffic caused by these filter pages.
Now that we are done with the identification step, we can move on to how to remove a page group or a single page from Google.
How to block the page for index?
First, we need to check if the pattern of the related page group is open for crawling from the Robots.txt file. There shouldn’t be any “Disallow: ...” line in the Robots.txt file related to an element of the file.
To completely remove our page from Google index, bots must be able to see our removal request and easily crawl the page.
Then, we need to add the tag line below to the source code of each page we want to block for index.
<meta name="robots" content="noindex, follow">
By doing so, search engine spiders (bots) will know that the page will not be indexed and will be removed from the index if it is already indexed.
After adding the related tag to the source code, you can apply the steps mentioned in the video in order to ensure that bots will arrive to the page faster and see the tag.
We will log in to the Google Search Console account of our website. Then;
1) Choose “Fetch as Google” from the left menu:
2) Enter the rest of the URL that we want bots to come to our page. For example, if the page address is “https://www.example.com/en/women-shorts?filter=blue” then we write “en/women-shorts?filter=blue”.
3) Click “FETCH AND RENDER” and wait for a while.
By performing these steps, we manually call the bot to our page and make them faster see the “noindex” tag we added to the source code.
Duration of the removal from the index process may vary depending on the size of the website and the intensity of the page group we want to remove. It is not possible to declare a specific amount of time but after a while we will see that all the pages tagged with “noindex” are completely removed from Google index.
What should I do to prevent reindexing after my page is removed?
In order to efficiently use the crawl budget we mentioned earlier, we need to make sure that bots will not crawl the pages we removed from the index.
By blocking the pages that are no longer in the index to crawling, a potential reindex can be prevented and bots will be more focused on the important pages instead of these pages.
To do that, we need to block the pattern of the page groups that we removed from the index with a single line from the Robots.txt file. For example, if we don’t want the filter pages to be crawled again, as in https://www.example.com/en/women-shorts?filter=blue, we can add a line below to the Robots.txt file:
Disallow: *filter=*
(*) marks are regex markings to include all the parts before and after the “filter=” parameter.
Unfortunately, it might not be possible to add “noindex” tag to the source code for some e-commerce infrastructures. In this case, although not sure-fire, there is a way to remove the pages from the index via Robots.txt file.
Personally, the tests I performed on this matter were successful and you can use this method as a last option. To use this method, instead of adding the “Disallow” line to Robots.txt file, you need to add the line below:
Noindex: *filter=*
(*) marks are regex markings to include all the parts before and after the “filter=” parameter.
We came to the end of our article. We hope that it will be a useful guide to remove your pages from the index. If you ask your questions in the comment section, we will try our best to help you. Also, we would be very glad if you can share the article. :)