404 page not found after site migration
-
Hi,
A question from our developer.
We have an issue in Google Webmaster Tools.
A few months ago we killed off one of our e-commerce sites and set up another to replace it. The new site uses different software on a different domain. I set up a mass 301 redirect that would redirect any URLs to the new domain, so domain-one.com/product would redirect to domain-two.com/product. As it turns out, the new site doesn’t use the same URLs for products as the old one did, so I deleted the mass 301 redirect.
We’re getting a lot of URLs showing up as 404 not found in Webmaster tools. These URLs used to exist on the old site and be linked to from the old sitemap. Even URLs that are showing up as 404 recently say that they are linked to in the old sitemap. The old sitemap no longer exists and has been returning a 404 error for some time now. Normally I would set up 301 redirects for each one and mark them as fixed, but there are almost quarter of a million URLs that are returning 404 errors, and rising.
I’m sure there are some genuine problems that need sorting out in that list, but I just can’t see them under the mass of errors for pages that have been redirected from the old site. Because of this, I’m reluctant to set up a robots file that disallows all of the 404 URLs.
The old site is no longer in the index. Searching google for site:domain-one.com returns no results.
Ideally, I’d like anything that was linked from the old sitemap to be removed from webmaster tools and for Google to stop attempting to crawl those pages.
Thanks in advance.
-
I agree that the 301 redirect would be your best option as you can pass along not only users but the bots to the right page.. You may need to get a developer in to write some regular expressions to parse the incoming request and then automatically find the correct new URL. I have worked on sites with a large number of pages and using some sort of automation is the only way to go.
That said, if you simply want to kill the old URLs you can show the 404s or 410s. As you mention, then you end up with a bunch of 404 errors in GWT. I have been there too, it's like damned if you do, damned if you don't. We had some URLs that were tracking URLs from an old site and we are now here a year later (been showing 410s for over a year on the old tracking URLs) they still show up in GWT as errors.
We are trying a new solution for how to remove these URLs from the index without getting 404 errors. We show a 200 and then we put up a minimal html page with the meta robots noindex tag.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. "
So, we allow Google to find the page, get a 200 (so no 404 errors), but then use the meta noindex tag to tell Google to remove it from the index and stop crawling the page.
Remember, this is the "nuclear" option. You only want to do this to remove the pages from the Google index. Someone mentioned using GWT to remove URLs, but if I remember correctly, you only have so many pages you can do this with at a time.
If you list the files within the robots.txt. Google will not spider the files, but then if you remove the page from robots.txt file, they will start to try spidering again. I have seen Google come back a year later on URLs when I take them out of robots. This is what happened to us and so we tried just showing the 410/404, but Google still keeps crawling. We recently moved to this option with the 200/noindexmeta and it seems to be working.
Good luck!
-
You can but the 404s should stop being crawled on their own. There's a webmaster tool that you can use to make that happen faster as well
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=64033
-
Yeah it's a 404 http://www.tester.co.uk/17th-edition-equipment/multifunction-testers/fluke-1651b-multifunction-installation-tester
with over 200,000 404's its a lot to go through and 301. For some reason they it got migrated they just pointed the old url to a new one replacing the root domain name without creating matching url's. Doh.
I was thinking about robot.txt filling them all?
-
A 404 should cause Google to de-index the content. Go to one of the bad URLs and view the headers to make sure that your webserver is returning a status 404 and not just a 404 "page".
As hard and time consuming as it might be, I would still pursue a 301 option. It's the cleanest way to resolve the issue. Just start nibbling at it and you can make a dent. Doing nothing just lets the problem grow.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to fix non-crawlable pages affected by CSS modals?
I stumbled across something new when doing a site audit in SEMRUSH today ---> Modals. The case: Several pages could not be crawled because of (modal:) in the URL. What I know: "A modal is a dialog box/popup window that is displayed on top of the current page" based on CSS and JS. What I don't know: How to prevent crawlers from finding them.
Web Design | | Dan-Louis0 -
Fixing my sites problem with duplicate page content
My site has a problem with duplicate page content. SEO MOZ is telling me 725 pages worth. I have looked a lot into the 301 Re direct and the Rel=canonical Tag and I have a few questions: First of all, I'm not sure which on I should use in this case. I have read that the 301 Redirect is the most popular path to take. If I take this path do I need to go in and change the URL of each of these pages or does it automatically change with in the redirect when I plug in the old URL and the new one? Also, do I need to just go to each page that SEO MOZ is telling me is a duplicate and make a redirect of that page? One thing that I am very confused about is the fact that some of these duplicates listed out are actually different pages on my site. So does this just mean the URL's are too similar to each other, and there fore need the redirect to fix them? Then on the other hand I have a log in page that says it has 50 duplicates. Would this be a case in which I would use the Canonical Tag and would put it into each duplicate so that the SE knew to go to the original file? Sorry for all of the questions in this. Thank you for any responses.
Web Design | | JoshMaxAmps0 -
What's the point of an EU site?
Buongiorno from 18 degrees C Wetherby UK 🙂 On this site http://www.milwaukeetool.eu/ the client wants to hold on to the EU site despite there being multiple standalone country sittes e.g. http://www.milwaukeetool.fr & http://www.milwaukeetool.co.uk Why would you ever need an EU site? I mean who ever searches for an EU site? If the client holds on to the eu site despite my position it's a waiste of time from a search perspective is the folowing the best appeasment? When a user enters the eu url or redirects to country the detected, eg I'm in Paris I enter www.milwaukeetool.eu it redirects to http://www.milwaukeetool.fr. My felling this would be the most pragmatic thing to do? Any ideas please,
Web Design | | Nightwing
Cioa,
David0 -
One Page Guide vs. Multiple Individual Pages
Howdy, Mozzers! I am having a battle with my inner-self regarding how to structure a resources section for our website. We're building out several pieces of content that are meant to be educational for our clients and I'm having trouble deciding how to layout the content structure. We could either layout all eight short sections on a single page, or create individual pages for each section. The goal is obviously to attract new potential clients by targeting these terms that they may be searching for in an information gathering stage. Here's my dilemma...
Web Design | | jpretz
With the single page guide, it would be nice because it will have a lot of content (and of course, keywords) to be picked up by the SERPS but I worry that it is going to be a bit crammed (because of eight sections) for the user. The individual pages would be much better organized and you can target more specific keywords, but I worry that it may get flagged for light content as some pages may have as little as a 150 word description. I have always been mindful of writing copy for searchers over spiders, but now I'm at a more technical crossroads as far as potentially getting dinged for not having robust content on each page. Here's where you come in...
What do you think is the better of the two options? I like the idea of having the multiple pages because of the ability to hone-in on a keyword and the clean, organized feel, but I worry about the lack of content (and possibly losing out on long-tail opportunities). I'd love to hear your thoughts. Please and thank you. Ready annnnnnnnnnnnd GO!0 -
Site health - webmaster tools
A bit of an odd one. In Webmaster Tools, there's the option to order sites by site health. When we do this our site - http://www.neooptic.com/ - is near the bottom, despite there being little or no crawl errors. Any ideas why this could be happening?
Web Design | | neooptic0 -
Sites went from page 1 to page 40 + in results
Hello all We are looking for any insight we can get as to why all (except 1) of our sites were effected very badly in the rankings by Google since the Panda updates. Several of our sites londonescape.com dublinescape.com and prague, paris, florence, delhi, dubai and a few others (all escape.com urls) have had major drop in their rankings. LondonEscape.net (now.com (changed after rank drop) ), was ranked between 4th & 6th but is now down around 400th and DelhiEscape.net and MunichEscape.com were both number 1 for several years for our main key words We also had two Stay sites number 1 , AmsterdamStay and NewYorkstay both .com ranked number 1 for years , NewYork has dropped to 10th place so far the Amsterdam site has not been effected. We are not really sure what we did wrong. MunichEscape and DelhiEcape should never have been page 1 sites ) just 5 pages and a click thru to main site WorldEscape) but we never did anything to make them number 1. London, NewYork and Amsterdam sites have had regular new content added, all is checked to make sure its original. **Since the rankings drop ** LondonEscape.com site We have redirected the.net to the .com url Added a mountain of new articles and content Redesigned the site / script Got a fair few links removed from sites, any with multiple links to us. A few I have not managed yet to get taken down. So far no result in increased rankings. We contacted Google but they informed us we have NOT had a manual ban imposed on us, we received NO mails from Google informing us we had done anything wrong. We were hoping it would be a 6 month ban but we are way past that now. Anyone any ideas ?
Web Design | | WorldEscape0 -
Site down for more than a month - lost rankings
Hello, We have run into a situtation where we had multiple pages setup for different keywords but didn't realize that we had a name server issue that has caused the pages to be down for the last month or so (2-3 weeks on the low side.) The rank finder was still working fine, but the offline page was never reported. We realized the situation recently and have since gotten the sites back online under the new nameservers. Most of these sites were ranking 1 and 2 spots in their keywords, and now are no where to be found in the Google Index. Should I do anything differently, or just put the sites back online and wait it out? I have seen in different places that it may only take 2 weeks to come back, but it's possible that Google has marked the sites as 'not quality' because of their downtime and it will be even harder to get them to rank again. Can anyone shed any light on this situation? Any information is appreciated. Thanks in advance.
Web Design | | EQ-Richie0 -
Home Page Optimization
I only discovered SEOmoz about a week ago and my knowledge in this area has grown 500% in that time, but I'm still a newbie. I'm looking for whether I have the right general idea or not with my home page in regards to SEO. The page is located at Line.com. The top section with the images is 100% for humans. The next section is where the SEO comes into play. I have 5 different services [sports monitor, free sports betting, sports betting forum, sports handicapper websites, gambling affiliate program] that I offer on 5 different inner pages. What I'm trying to do is have my home page rank decently for my desired terms and then pass link juice to the respective pages. My goal is to eventually have my inner pages rank higher than my home page for my desired search terms. Do I have the right general idea or am I way off? Is this too much for the search engines with all of the links and bold text? Design criticisms are also welcome, and anybody who wants to critique the inner pages would be forever thanked. Feel free to be as harsh as you want as long as it's constructive. Thanks!
Web Design | | PatrickGriffith0