Thousands of 404 Pages Indexed - Recommendations?
-
Background: I have a newly acquired client who has had a lot of issues over the past few months.
What happened is he had a major issue with broken dynamic URL's where they would start infinite loops due to redirects and relative links. His previous SEO didn't pay attention to the sitemaps created by a backend generator, and it caused hundreds of thousands of pages to be indexed. Useless pages.
These useless pages were all bringing up a 404 page that didn't have a 404 server response (it had a 200 response) which created a ton of duplicate content and bad links (relative linking).
Now here I am, cleaning up this mess. I've fixed the 404 page so it creates a 404 server response. Google webmaster tools is now returning thousands of "not found" errors, great start. I fixed all site errors that cause infinite redirects. Cleaned up the sitemap and submitted it.
When I search site:www.(domainname).com I am still getting an insane amount of pages that no longer exist.
My question: How does Google handle all of these 404's? My client wants all the bad pages removed now but I don't have as much control over that. It's a slow process getting Google to remove these pages that are returning a 404. He is continuously dropping in rankings still.
Is there a way of speeding up the process? It's not reasonable to enter tens of thousands of pages into the URL Removal Tool.
I want to clean house and have Google just index the pages in the sitemap.
-
yeah all of the 301's are done - but I am trying to get around submitting tens of thousands of URL's to the URL removal tool.
-
Make sure you pay special attention to implementing the correct rel canonical was first introduced we wanted to be a little careful. We didn’t want to open it up for potential abuse so you could only use rel canonical within one domain. The only exception to that was you could do between IP addresses and domains.
But over time we didn’t see people abusing it a lot and if you think about it, if some evil malicious hacker has hacked your website and he’s going to do something to you he’s probably going to put some malware on the page or do a 301 redirect. He’s probably not patient enough to add a rel canonical and then wait for it to be re-crawled and re-indexed and all that sort of stuff.
So we sort of saw that there didn’t seem to be a lot of abuse. Most webmasters use rel canonical in really smart ways. We didn’t see a lot of people accidentally shooting themselves in the foot, which is something we do have to worry about and so a little while after rel canonical was introduced we added the ability to do cross domain rel canonical.
It basically works essentially like a 301 redirect. If you can do a 301 redirect that is still preferred because every search engine knows how to handle those and new search engines will know how to process 301s and permanent redirects.
But we do take a rel canonical and if it’s on one domain and points to another domain we will typically honor that. We always reserve the right to sort of hold back if we think that the webmaster is doing something wrong or making a mistake but in general we will almost always abide by that.
Hope that helps.
I had I have a client who unfortunately had a dispute with her prior IT person and the person made a mess of the site. It is not the quickest thing and I do agree 301 redirects are by far the quickest way to go about it. If you're getting 404 errors and the site is passing link juice. You're going to want to redirect those scattered about the website to the most relevant page.
http://jamesmartell.com/matt-cutts/how-does-google-handle-not-found-pages-that-do-not-return-a-404/
http://www.seroundtable.com/404-links-google-15427.html
http://support.google.com/customsearch/bin/topic.py?hl=en&topic=11493&parent=1723950&ctx=topic
https://developers.google.com/custom-search/docs/indexing
https://developers.google.com/custom-search/docs/api
I hope I was of help to you,
Thomas
-
Have you redirected (301) to appropriate landing pages ? After redirection, use URL removal tool. Its work great for me, its shows the result in 24 hours to me. Its removes all the URLs from Google index that I have submitted into it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Rel canonical tag from shopify page to wordpress site page
We have pages on our shopify site example - https://shop.example.com/collections/cast-aluminum-plaques/products/cast-aluminum-address-plaque That we want to put a rel canonical tag on to direct to our wordpress site page - https://www.example.com/aluminum-plaques/ We have links form the wordpress page to the shop page, and over time ahve found that google has ranked the shop pages over the wp pages, which we do not want. So we want to put rel canonical tags on the shop pages to say the wp page is the authority. I hope that makes sense, and I would appreciate your feeback and best solution. Thanks! Is that possible?
Intermediate & Advanced SEO | | shabbirmoosa0 -
Magento 1.9 SEO. I have product pages with identical On Page SEO score in the 90's. Some pull up Google page 1 some won't pull up at all. I am searching for the exact title on that page.
I have a website built on Magento 1.9. There are approximately 290,000 part numbers on the site. I am sampling Google SERP results. About 20% of the keywords show up on page 1 position 5 thru 10. 80% don't show up at all. When I do a MOZ page score I get high 80's to 90's. A page score of 89 on one part # may show up on page one, An identical page score on a different part # can't be found on Google. I am searching for the exact part # in the page title. Any thoughts on what may be going on? This seems to me like a Magento SEO issue.
Intermediate & Advanced SEO | | CTOPDS0 -
Link Removal Request Sent to Google, Bad Pages Gone from Index But Still Appear in Webmaster Tools
| On June 14th the number of indexed pages for our website on Google Webmaster tools increased from 676 to 851 pages. Our ranking and traffic have taken a big hit since then. The increase in indexed pages is linked to a design upgrade of our website. The upgrade was made June 6th. No new URLS were added. A few forms were changed, the sidebar and header were redesigned. Also, Google Tag Manager was added to the site. My SEO provider, a reputable firm endorsed by MOZ, believes the extra 175 pages indexed by Google, pages that do not offer much content, may be causing the ranking decline. My developer submitted a page removal request to Google via Webmaster tools around June 20th. Now when a Google search is done for site:www.nyc-officespace-leader.com 851 results display. Would these extra pages cause a drop in ranking? My developer issued a link removal request for these pages around June 20th and the number in the Google search results appeared to drop to 451 for a few days, now it is back up to 851. In Google Webmaster Tools it is still listed as 851 pages. My ranking drop more and more everyday. At the end of displayed Google Search Results for site:www.nyc-officespace-leader.comvery strange URSL are displaying like:www.nyc-officespace-leader.com/wp-content/plugins/... If we can get rid of these issues should ranking return to what it was before?I suspect this is an issue with sitemaps and Robot text. Are there any firms or coders who specialize in this? My developer has really dropped the ball. Thanks everyone!! Alan |
Intermediate & Advanced SEO | | Kingalan10 -
An affiliate website uses datafeeds and around 65.000 products are deleted in the new feeds. What are the best practises to do with the product pages? 404 ALL pages, 301 Redirect to the upper catagory?
Note: All product pages are on INDEX FOLLOW. Right now this is happening with the deleted productpages: 1. When a product is removed from the new datafeed the pages stay online and are showing simliar products for 3 months. The productpages are removed from the categorie pages but not from the sitemap! 2. Pages receiving more than 3 hits after the first 3 months keep on existing and also in the sitemaps. These pages are not shown in the categories. 3. Pages from deleted datafeeds that receive 2 hits or less, are getting a 301 redirect to the upper categorie for again 3 months 4. Afther the last 3 months all 301 redirects are getting a customized 404 page with similar products. Any suggestions of Comments about this structure? 🙂 Issues to think about:
Intermediate & Advanced SEO | | Zanox
- The amount of 404 pages Google is warning about in GWT
- Right now all productpages are indexed
- Use as much value as possible in the right way from all pages
- Usability for the visitor Extra info about the near future: Beceause of the duplicate content issue with datafeeds we are going to put all product pages on NOINDEX, FOLLOW and focus only on category and subcategory pages.0 -
How do I best deal with pages returning 404 errors as they contain links from other sites?
I have over 750 URL's returning 404 errors. The majority of these pages have back links from sites, however the credibility of these pages from what I can see is somewhat dubious, mainly forums and sites with low DA & PA. It has been suggested placing 301 redirects from these pages, a nice easy solution, however I am concerned that we could do more harm than good to our sites credibility and link building strategy going into 2013. I don't want to redirect these pages if its going to cause a panda/penguin problem. Could I request manual removal or something of this nature? Thoughts appreciated.
Intermediate & Advanced SEO | | Towelsrus0 -
404 with a Javascript Redirect to the index page...
I have a client that is wanting me to issue a 404 on her links that are no longer valid to a custom 404, pause for 10 seconds, then rediirect to the root page (or whatever other redirect logic she wants)...to me it seems trying to game googlebot this way is a "bad idea" Can anyone confirm/deny or offer up a better suggestion?
Intermediate & Advanced SEO | | JusinDuff0 -
How long till pages drop out of the index
In your experience how long does it normally take for 301-redirected pages to drop out of Google's index?
Intermediate & Advanced SEO | | bjalc20110