Old pages STILL indexed...
-
Our new website has been live for around 3 months and the URL structure has completely changed. We weren't able to dynamically create 301 redirects for over 5,000 of our products because of how different the URL's were so we've been redirecting them as and when.
3 months on and we're still getting hundreds of 404 errors daily in our Webmaster Tools account. I've checked the server logs and it looks like Bing Bot still seems to want to crawl our old /product/ URL's. Also, if I perform a "site:example.co.uk/product" on Google or Bing - lots of results are still returned, indicating the both still haven't dropped them from their index.
Should I ignore the 404 errors and continue to wait for them to drop off or should I just block /product/ in my robots.txt? After 3 months I'd have thought they'd have naturally dropped off by now!
I'm half-debating this:
User-agent: *
Disallow: /some-directory-for-all/*User-agent: Bingbot
User-agent: MSNBot
Disallow: /product/Sitemap: http://www.example.co.uk/sitemap.xml
-
Yea. If you cannot do it dynamically, it gets to be a real PIA, and also, depending on how you setup the 301s, you may get an overstuffed .htaccess file that could cause problems.
If these pages were so young and did not have any link equity or rank to start with, they are probably not worth 301ing.
One tool you may want to consider is URLprofiler http://urlprofiler.com/ You could take all the old URLs and have URL profiler pull in GA data (from when they were live on your site) and then also pull in OSE data from Moz. You can then filter them and see what pages got traffic and links. Take those select "top pages" and make sure they 301 to the correct page on the new URL structure and then go from there. URL profiler has a free 15 day trial that you could use for this project and get done at no charge. But after using the product, you will see it is pretty handy and may buy anyway.
Ideally, if you could have dynamically 301ed the old pages to the new, that would have been the simplest method, but with your situation, I think you are ok. Google is just trying to help to make sure you did not "mess up" and 404 those old pages on accident. It wants to give you the benefit of the doubt. It is crazy sometimes how they keep things in the index.
I am monitoring a site that scraped one of my sites. They shut the entire site down after we threatened legal action. The site has been down for weeks and showing 404s, but I can still do a site: search and see them in the index. Meh.
-
Forgot to add this - just some free advice. You have your CSS inlined in your HTML. Ideally, you want to have that in an external CSS file. That way, once the user loads that external file, they do not have to download it multiple times so the experience is faster on subsequent pages.
If you were testing your page with Google site speed and they mentioned render blocking CSS issues and that is why you inlined your CSS, the solution is not to inline all your CSS, but to just inline what is above the fold and put the rest in an external file.
Hope that makes sense.
-
I suppose that's the problem. We've spent hours redirecting hundreds of 404 pages to new/relevant locations - but these pages don't receive organic traffic. It's mostly just BingBot, MSNBot and GoogleBot crawling them because they're still indexed.
I think I'm going to leave them as 404 rather than trying to keep on top of 301 redirecting them and I'll leave it in Google's hands to eventually drop them off!
Thanks!
Liam
-
General rule of thumb, if a page 404s and it is supposed to 404 dont worry about it. The Search Console 404 report does not mean that you are being penalized although it can be diagnostic. If you block the 404 pages in robots.txt yea, it will take the 404 errors out of the Search Console report, but then Google never "deals" with those 404s. It can take 3 months (maybe longer) to get things out of Search Console, I have noticed it taking longer here lately, but what you need to do first is ask the following questions
-
Do I still link internally to any of these /product/ URLs? If you do, Google may assume that you are 404ing those pages by mistake and leave them in the report longer as if you are still linking internally to them they must be a viable page.
-
Do any of these old URLs have value? Do they have links to them from external sites? Did they used to rank for a KW? You should probably 301 them to a semantically relevant page then vs 404ing and getting some use out of them.
If you have either of the above, Google may continue to remind you of the 404 as it thinks the page might be valuable and want to "help" you out.
You mention 5,000 URLs that were indexed and then you 404 them. You cannot assume that Search Console works in real time or that Google checks all 5,000 of these URLs at the same time. Google has a given crawl budget for your site on how often it will crawl a given page. Some pages they crawl more often (home page) some pages they crawl less often. They then have to process those crawls once they get the data back. What you will see in a situation like this is that if you 404 several thousand pages, you will first see several hundred show up in your Search Console report, then the next day some more, then some more, etc. Over time, the total will build and then may peak and then gradually start to fall off. Google has to find the 404s, process them and then show them in the report. You may see 500 of your 404 pages today, but then 3 months later, there may be 500 other 404 pages that show up in the report and those original 500 are now gone. This is why you might be seeing 404 errors after 3 months in addition to the examples I gave above.
It would be great if the process were faster and the data was cleaner. The report has a checkbox for "this is fixed" and that is great if you fixed something, but they need a checkbox for "this is supposed to 404" to help clear things out. If I have learned anything about Search Console, it is helpful, but the data in many cases is not real time.
Good luck!
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Preserving link equity from old pages
Hi Moz Community, We have a lot of old pages built with Dreamweaver a long time ago (2003-2010) which sit outside our current content management system. As you'd expect they are causing a lot of trouble with SEO (Non-responsive, duplicate titles and various other issues). However, some of these older pages have very good backlinks. We were wondering what is the best way to get rid of the old pages without losing link equity? In an ideal world we would want to bring over all these old pages to our CMS, but this isn't possible due to the amount of pages (~20,000 pages) and cost involved. One option is obviously to bulk 301 redirect all these old pages to our homepage, but from what we understand that may not lead to the link equity being passed down optimally by Google (or none being passed at all). Another option we can think of would be to bring over the old articles with the highest value links onto the current CMS and 301 redirect the rest to the homepage. Any advice/thoughts will be greatly appreciated. Thumbs up! Thanks,
Intermediate & Advanced SEO | | 3gcouk0 -
Changing domain names but still ranking as old one
Hi there, I have a client who changed domain names back in November 2015 but is still coming up in search engines with their old domain name not their new one. For example, I search for my clients name, let's call them Example B. So I search for "Example B" and within the search results they come up top and the title tag is correct as it says something along the lines of "Welcome to Example B". However the URL underneath is actually their old name which is Example A. When you click on the link, it redirects over to the new name so thats fine, but it's just annoying that Example A is still appearing when it should be Example B now. I don't think they have a new Webmaster Tools account setup for their new domain (I need to check still), but they do still have their old one setup. Is there something I can do within Webmaster Tools to tell it that Example A is now gone and to start indexing and referring to them as Example B? What else should I do to make sure their new name is coming up not their old one anymore?
Intermediate & Advanced SEO | | Virginia-Girtz1 -
301: Delete old page, or keep?
Hey everybody! So, for those who have followed some of my posts I have myself in a bit of a quagmire that I am not going to get into. Some solutions have come to light and others are still pending and I will update my past questions with solutions! On the safer side of things I have a new situation. As I am going through our pages we have three different pages for "Admissions" Admissions Admission Guidelines Admission Information The "admissions' page has no link or feed to the other admissions page, and actually has no content on it at all. The "Admission Guidelines" page feeds to the "admission information" page, which although is extremely redundant is a different project for a different day. I am planning on putting a 301 on the "admissions" page and sending it to the "admission guidelines" page. When I do so, should I delete the old page? Does it matter? Is there a pro or con for either? Thanks guys!
Intermediate & Advanced SEO | | HashtagHustler0 -
The Wrong Page Is Still The Only One Ranking
For some reason, the search term "Tampa Personal Injury Attorney," shows this page of our's http://www.kempruge.com/personal-injury/ on the second page, but omits this page http://www.kempruge.com/tampa-personal-injury-attorney/ (the correct one). The correct one only shows up in omitted results. I have posted this question before. I made the changes suggested to me, and it actually worked for a couple weeks. But, it reverted back. I tried for the last two months to fix this on my own, but I just can't figure it out. Does anyone have any idea what to do here? Incredibly appreciative of any assistance, Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup0 -
Interlinking from unique content page to limited content page
I have a page (page 1) with a lot of unique content which may rank for "Example for sale". On this page I Interlink to a page (page 2) with very limited unique content, but a page I believe is better for the user with anchor "See all Example for sale". In other words, the 1st page is more like a guide with items for sale mixed, whereas the 2nd page is purely a "for sale" page with almost no unique content, but very engaging for users. Questions: Is it risky that I interlink with "Example for sale" to a page with limited unique content, as I risk not being able to rank for either of these 2 pages Would it make sense to "no index, follow" page 2 as there is limited unique content, and is actually a page that exist across the web on other websites in different formats (it is real estate MLS listings), but I can still keep the "Example for sale" link leading to page 2 without risking losing ranking of page 1 for "Example for sale"keyword phrase I am basically trying to work out best solution to rank for "Keyword for sale" and dilemma is page 2 is best for users, but is not a very unique page and page 2 is very unique and OK for users but mixed up writing, pictures and more with properties for sale.
Intermediate & Advanced SEO | | khi50 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
Our website scores A but on google we are still on 7th page
Hi all, I have run on page keyword optimizations with exact terminology used to find our company service or our competition on google. We have ranked A, with almost all points complete. I did the same for our main competitor and they ranked F. Then i did page positioning on Google and they get on page 1 fifth line and we get page 7. We have plenty of unique content and extensive website.
Intermediate & Advanced SEO | | EMGCSR
Could there be any other reason than reason for this other than backlinks? Many thanks for your help.0 -
Getting Google in index but display "parent" pages..
Greetings esteemed SEO experts - I'm hunting for advice: We operate an accommodation listings website. We monetize by listing position in search results, i.e. you pay more to get higher placing in the page. Because of this, while we want individual detailed listing pages to be indexed to get the value of the content, we don't really want them appearing in Google search results. We ideally want the "content value" to be attributed to the parent page - and google to display this as the link in the search results instead of the individual listing. Any ideas on how to achieve this?
Intermediate & Advanced SEO | | AABAB0