Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?

kirmeliux

We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating.

Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site.

So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure?

We are signed up with WMT if that helps.

CleverPhD

What we run into often is that on larger sites there 1) still are internal links to those pages from old blog posts etc. You have to really scrub your site to find those and manually update. I am only mentioning this as unless you used a tool to crawl the site and looked at it with a fine toothed comb, you might be surprised to find the links you missed 2) there are still external links to those pages. That said, even if 1 and 2 are not met, Google will still recrawl (although not as often). Google assumes that any initial 404 or even 301 may be a temporary error and so checks back. I have seen urls that we removed over a year ago, Google will still ping them. They really hang onto stuff. I have not gone as far as the 301 to a directory that I deindex, but generally just watch to see them show up and then fall out of Webmaster Tools and then I move on.

kirmeliux

Right, but having lots of 404's that are still indexed probably isn't good for your site in general. If you wanted them de-indexed, 301'ing them to a new folder and filing a single removal request for that entire directory would probably work.

Thanks for the help. I've heard from a few people that they will recrawl these pages again even if nothing is linking to them. That's reassuring. Thanks all.

Kingof5

No reason other than finding all those 404 pages and doing individual URL removals for each isn't a very productive task. 404s generally have no impact on search rankings.

kirmeliux

Interesting. Any reason why you haven't simply filed a removal request? I feel if there's too many to manually do, you could 301 them to a specific directory and then manually remove that directory all at once?

kirmeliux

Hi Martijn,

Thanks for the response. I must apologize as I left out an important detail. While are pages are "No results" and basically useless to the user, they're not actually 404'd pages. They're live, valid pages that basically offer nothing.

As I stated earlier, 404'ing them would be ideal for us if we could be sure Google would recrawl them. I am hesitant due to uncertainty of Googlebot re-crawling unlinked internal links. Our deeper pages like these have not been updated/recrawled yet, so I'm a bit unsure as to how likely they will.

I guess I should just go ahead and 404 all of them now and see what happens, since it can't hurt. Just curious about Googlebot in general since it always helps to know more!

Kingof5

Don't count on Google dropping those 404ing pages from the index any time soon. We have pages that have 404d for over a year and they're still in the index.

Martijn_Scheijbeler

They'll eventually drop these pages as they already know where to find them and as they give the proper 404 header they know that's a sign to drop them. In most cases pages that 404 are already not linked from any other pages so that will also be a sign to search engines that the specific pages aren't important anymore.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

"Equity sculpting" with internal nofollow links

Google crawling but not indexing for no apparent reason

Value of internal links like this

Wrong canonical URL was specified. How to refresh the index now?

Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?

How to remove all sandbox test site link indexed by google?

Adding parameters in URLs and linking to a page

Existing Pages in Google Index and Changing URLs