Removing a website from Google

Google keeps pages in its database for a long time after they have been removed, at least several months, and there is usually a cached version of the page somewhere. This is a problem when a blog page or photo gallery gets spammed and genuine searches return unpleasant results, particularly when they appear in the university search results.

Google explains how to remove pages from its database. I used the following solution to remove all the pages of an obsolete photo gallery.

Returning a 404 Page Not Found is not enough to remove an entire folder. In fact you don’t have to remove the pages at all.

But you do need to make sure search motors are aware that these pages shouldn’t be indexed by creating an appropriate robots.txt file at the root of your domain. Mine contains
User-agent: *
Disallow:/~carl/gallery/

Unfortunately you need an account so you can log in to the Webmaster Tools.

Then you need to add your site and confirm that you control it by adding a file with a special name to the root directory or adding a particular string to the main page meta data.

Ask Google to remove the directory or file by going to Site Configuration, Crawler Access, Remove URL.

Within a few days the pages should no longer appear in search results and the robots.txt file will prevent them from ever reappearing.

This entry was posted on Wednesday, September 9th, 2009 at 09:57:58 System/Localtime and is filed under Misc. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a reply