Old URLs that have 301s to 404s not being de-indexed.
-
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404.
Here's what I mean.
Case 1 - Good page:
http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200
Case 2 - Bad page that no longer exists:
http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404
Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version.
Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing.
Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
-
I don't think 404 vs 410 is the answer here.The basis for this thought is the following:
========
"if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn’t intended to be a page not found.”
“If we see a 410, then the site crawling system says, OK we assume the webmasters knows what they’re doing because they went off the beaten path to deliberately say this page is gone,” he said. “So they immediately convert that 410 to an error, rather than protecting it for 24 hours."
========
I'm thinking the deeper issue is why the 301s are not being respected. If a link points to http://domain.com/badpage and we use a 301 to point to https://domain.com/badpage - shouldn't the crawler (Google or otherwise) respect the 301? Why still index and serve up a page that responds with the 301? To me, this is baffling. If we serve up a 404 or a 410 - either way we are saying "this page is gone" but we're still seeing the original http://domain.com/badpage in the index?
Does that make sense? Or is there more clarification required?
-
sym_admin is right--you'll want to find the source of those pages, as Google apparently is seeing them from somewhere and still requesting them. If there are links to those pages somewhere, you will need to remove them. Also, if you're able, I would change those URLs so that they serve up a "410 Gone" error, and not a 404.
-
Read these three, then do what you got to do...
https://www.searchcommander.com/how-to-bulk-remove-urls-google/
https://productforums.google.com/forum/#!topic/webmasters/uYFJnsyiH8w
https://mza.bundledseo.com/community/q/404-redirects-to-the-homepage-is-this-good-bad-ugly
For proper removal, please ensure that there are no INTERNAL links anywhere on your website to 404 addresses, from sitemap, buttons, text, or images (the whole 9 yards).
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index Bloat: Canonicalize, Redirect or Delete URLs?
I was doing some simple on-page recommendations for a client and realized that they have a bit of a website bloat problem. They are an ecommerce shoe store and for one product, there could be 10+ URLs. For example, this is what ONE product looks like: example.com/products/shoename-color1 example.com/products/shoename-color2 example.com/collections/style/products/shoename-color1 example.com/collections/style/products/shoename-color2 example.com/collections/adifferentstyle/products/shoename-color1 example.com/collections/adifferentstyle/products/shoename-color2 example.com/collections/shop-latest-styles/products/shoename-color1 example.com/collections/shop-latest-styles/products/shoename-color2 example.com/collections/all/products/shoename-color1 example.com/collections/all/products/shoename-color2 ...and so on... all for the same shoe. They have about 20-30 shoes altogether, and some come in 4-5 colors. This has caused some major bloat on their site and I assume some confusion for the search engine. That said, I'm trying to figure out what the best way to tackle this is from an SEO perspective. Here's where I've gotten to so far: Is it better to canonicalize all URLs, referencing back to one "main" one, delete all bloat pages re-link everything to the main one(s), or 301 redirect the bloat URLs back to the "main" one(s)? Or is there another option that I haven't considered? Thanks!
Intermediate & Advanced SEO | | AJTSEO0 -
URL indexed but not submitted in sitemap, however the URL is in the sitemap
Dear Community, I have the following problem and would be super helpful if you guys would be able to help. Cheers Symptoms : On the search console, Google says that some of our old URLs are indexed but not submitted in sitemap However, those URLs are in the sitemap Also the sitemap as been successfully submitted. No error message Potential explanation : We have an automatic cache clearing process within the company once a day. In the sitemap, we use this as last modification date. Let's imagine url www.example.com/hello was modified last time in 2017. But because the cache is cleared daily, in the sitemap we will have last modified : yesterday, even if the content of the page did not changed since 2017. We have a Z after sitemap time, can it be that the bot does not understands the time format ? We have in the sitemap only http URL. And our HTTPS URLs are not in the sitemap What do you think?
Intermediate & Advanced SEO | | ZozoMe0 -
URL Change Best Practice
I'm changing the url of some old pages to see if I can't get a little more organic out of them. After changing the url, and maybe title/desc tags as well, I plan to have Google fetch them. How does Google know that the old url is 301'd to the new url and the new url is not just a page of duplicate content? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
HTTPS Certificate Expired. Website with https urls now still in index issue.
Hi Guys This week the Security certificate of our website expired and basically we now have to wail till next Tuesday for it to be re-instated. So now obviously our website is now index with the https urls, and we had to drop the https from our site, so that people will not be faced with a security risk screen, which most browsers give you, to ask if you are sure that you want to visit the site, because it's seeing it as an untrusted one. So now we are basically sitting with the site urls, only being www... My question what should we do, in order to prevent google from penalizing us, since obviously if googlebot comes to crawl these urls, there will be nothing. I did however re-submitted it to Google to crawl it, but I guess it's going to take time, before Google picks up that now only want the www urls in the index. Can somebody please give me some advice on this. Thanks Dave
Intermediate & Advanced SEO | | daveza0 -
Page URL keywords
Hello everybody, I've read that it's important to put your keywords at the front of your page title, meta tag etc, but my question is about the page url. Say my target keywords are exotic, soap, natural, and organic. Will placing the keywords further behind the URL address affect the SEO ranking? If that's the case what's the first n number of words Google considers? For example, www.splendidshop.com/gift-set-organic-soap vs www.splendidshop.com/organic-soap-gift-set Will the first be any less effective than the second one simply because the keywords are placed behind?
Intermediate & Advanced SEO | | ReferralCandy0 -
Remove content that is indexed?
Hi guys, I want to delete a entire folder with content indexed, how i can explain to google that content no longer exists?
Intermediate & Advanced SEO | | Valarlf0 -
How canonical url harm our website???
Even though my website has no similar/copied content, i used rel=canonical for all my website pages. Is Google or yahoo make any harm to my SERP's?? EX: http://www.seomoz.org is my site, in that i used canonical as rel="<a class="attribute-value">canonical</a>" href="http://www.seomoz.org" to my home page like that similar to all pages, i created rel=canonical. Is search engine harm my website???
Intermediate & Advanced SEO | | MadhukarSV0 -
Google Maps results doesn't show my site url but rather the maps url, why is this?
For several of my clients landing pages that show up in the Maps results the website url has been overwritten by the maps url (maps.google.com). Even though on my places page I have the correct website set up. Does anyone have any idea why they would be doing this and how I can correct it? Thanks kinldy in advance, Aaron. maps-url.png
Intermediate & Advanced SEO | | afranklin0