Old URLs that have 301s to 404s not being de-indexed.
-
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404.
Here's what I mean.
Case 1 - Good page:
http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200
Case 2 - Bad page that no longer exists:
http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404
Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version.
Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing.
Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
-
I don't think 404 vs 410 is the answer here.The basis for this thought is the following:
========
"if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn’t intended to be a page not found.”
“If we see a 410, then the site crawling system says, OK we assume the webmasters knows what they’re doing because they went off the beaten path to deliberately say this page is gone,” he said. “So they immediately convert that 410 to an error, rather than protecting it for 24 hours."
========
I'm thinking the deeper issue is why the 301s are not being respected. If a link points to http://domain.com/badpage and we use a 301 to point to https://domain.com/badpage - shouldn't the crawler (Google or otherwise) respect the 301? Why still index and serve up a page that responds with the 301? To me, this is baffling. If we serve up a 404 or a 410 - either way we are saying "this page is gone" but we're still seeing the original http://domain.com/badpage in the index?
Does that make sense? Or is there more clarification required?
-
sym_admin is right--you'll want to find the source of those pages, as Google apparently is seeing them from somewhere and still requesting them. If there are links to those pages somewhere, you will need to remove them. Also, if you're able, I would change those URLs so that they serve up a "410 Gone" error, and not a 404.
-
Read these three, then do what you got to do...
https://www.searchcommander.com/how-to-bulk-remove-urls-google/
https://productforums.google.com/forum/#!topic/webmasters/uYFJnsyiH8w
https://mza.bundledseo.com/community/q/404-redirects-to-the-homepage-is-this-good-bad-ugly
For proper removal, please ensure that there are no INTERNAL links anywhere on your website to 404 addresses, from sitemap, buttons, text, or images (the whole 9 yards).
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving html site to wordpress and 301 redirect from index.htm to index.php or just www.example.com
I found page duplicate content when using Moz crawl tool, see below. http://www.example.com
Intermediate & Advanced SEO | | gozmoz
Page Authority 40
Linking Root Domains 31
External Link Count 138
Internal Link Count 18
Status Code 200
1 duplicate http://www.example.com/index.htm
Page Authority 19
Linking Root Domains 1
External Link Count 0
Internal Link Count 15
Status Code 200
1 duplicate I have recently transfered my old html site to wordpress.
To keep the urls the same I am using a plugin which appends .htm at the end of each page. My old site home page was index.htm. I have created index.htm in wordpress as well but now there is a conflict of duplicate content. I am using latest post as my home page which is index.php Question 1.
Should I also use redirect 301 im htaccess file to transfer index.htm page authority (19) to www.example.com If yes, do I use
Redirect 301 /index.htm http://www.example.com/index.php
or
Redirect 301 /index.htm http://www.example.com Question 2
Should I change my "Home" menu link to http://www.example.com instead of http://www.example.com/index.htm that would fix the duplicate content, as indx.htm does not exist anymore. Is there a better option? Thanks0 -
Our website is not being indexed
We have an issue with a site that we can't get to the bottom of. This site: (URL removed) is not being properly indexed. When we do a search for (URL removed) in google.com.au. The site appears as the 4th listing with the following title and description: (Title removed) A description for this result is not available because of this site's robots.txt – learn more. We have checked the site's robots.txt and can see its been now implemented correctly: (URL removed) About a week ago, we also went into Webmaster Tools and submitted a request for Google to recrawl our site. We are unsure what the issue is that is causing the site to not be properly indexed and how to resolve it. Any assistance on this topic would be most appreciated!
Intermediate & Advanced SEO | | Gavo0 -
Woo Commerce Woo Compare Urls Indexing?
Hi I am using Wordpress/Woo commerce for my site Thetotspot.co.uk http://www.thetotspot.co.uk/?action=yith-woocompare-add-product&id=1412&_wpnonce=a5560b1b07 But I am getting a lot of temporary redirects registering in Moz for things like the above - woo compare / add to cart links Anyone come across this - how did you get solve? I am using Yoast SEO currently, have no indexed archives and pages of archive etc.
Intermediate & Advanced SEO | | Kelly33300 -
Weird 404 URL Problem - domain name being placed at end of urls
Hey there. For some reason when doing crawl tests I'm finding pages with the domain name being tacked on the end and causing 404 errors.
Intermediate & Advanced SEO | | Jay328
For example: http://domainname.com/page-name/http://domainname.com This is happening to all pages, posts and even category type 1. Site is in Wordpress
2. Using Yoast SEO plugin Any suggestions? Thanks!0 -
How to find 20 hidden 404s
Hello, We have like twenty 404s left to find. How do you find these when: 1. They don't show up in Google Webmaster Tools 2. They don't have any other internal or external pages linking to them. 3. They don't show up in site:domain.com (We have 9000 pages and only 600 show up - I fixed those out of the 600). 4. They are probably causing high bounce rates. 5. They're not in the sitemap Thanks!
Intermediate & Advanced SEO | | BobGW0 -
Url structure of a blog
We are trying to work out what the best structure for our blog is as we want each page to rank as highly as possible, we were looking at a flat structure similar to http://www.hongkiat.com/blog/ where every posts is after the blog/ but not in category's although the viewers can look in different category's from the top buttons on the page- photoshop - icons etc or we where going to go for the structured way- blog/photoshop/blog-post.html the only problem is that we will end up 4 deep at least with this and at least 80 characters in the url. any help would be appreciated. Thanks Shaun
Intermediate & Advanced SEO | | BobAnderson0 -
How do you de-index and prevent indexation of a whole domain?
I have parts of an online portal displaying in SERPs which it definitely shouldn't be. It's due to thoughtless developers but I need to have the whole portal's domain de-indexed and prevented from future indexing. I'm not too tech savvy but how is this achieved? No index? Robots? thanks
Intermediate & Advanced SEO | | Martin_S0 -
Should I index tag pages?
Should I exclude the tag pages? Or should I go ahead and keep them indexed? Is there a general opinion on this topic?
Intermediate & Advanced SEO | | NikkiGaul0