New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When using long-tail keywords, should you exactly match for the url or delete "in" "to" etc.?
long-tail keyword - "seizures in adults with no history" Should you include "in and with" in the url?
On-Page Optimization | | Moleculera0 -
What heading tag to use on sidebars and footers
Hello, I have some awareness of how to use H1, H2 and H3.
On-Page Optimization | | kowston
H1 only once per page as the main page heading.
H2's should be subheadings, H3's are sub-sub headings of the and so on.
This structure gives hierarchy and opportunities to use additional keywords in an order of priority. I can clearly understand how this would work in an article but what about other content on the page such as global/frequently repeated elements like sidebars and footers? I see sites - and in particular, I have examed SEO focused sites - that use H3, H4 and H5 in these instances seemingly giving themselves scope to use at least H2 tags as part of the page content and break out of the structure hierarchy when dealing with sidebars and footers. I suppose this could signal theses headings are sections of the page that are less relevant than the main article content but that is just an assumption. I don't know what is correct.0 -
Canonical URL Tag Usage
I have a large website, almost 1500 pages that each market different keywords for the trucking logistics industry. I don't really understand the new Canonical URL Tag USAGE. They say to use it so the page is not a duplicate but the page that MOZ is call for to have the tag isn't a duplicate. It promotes 1 keyword that no other page directly promotes. Here is the page address, now what tag would I put up in the HEAD so google don't treat it as a duplicate page. http://www.freightetc.com/c/heavyhaul/heavyhaul.php 1. Number 1 the actual page address because I want it treated like its own page or do I have to use #2 below? 2. I don't know why I would use #2 as I want it to be its own page, and get credit and listed and ranked as its own page. Can anyone clarify this stuff to me as I guess i am just new to this whole tag usage.
On-Page Optimization | | dwebb0070 -
Duplicate URL errors when URL's are unique
Hi All, I'm running through MOZ analytics site crawl report and it is showing numerous duplicate URL errors, but the URLs appear to be unique. I see that the majority of the URL's are the same, but shouldn't the different brands make them unique to one another? http://www.sierratradingpost.com/clearance~1/clothing~d~5/tech-couture~b~33328/ http://www.sierratradingpost.com/clearance~1/clothing~d~5/zobha~b~3072/ Any ideas as to why these would be shown as duplicate URL errors?
On-Page Optimization | | STP_SEO0 -
Is it better to try and boost an old page that ranks on page #5 or create a better new page
Hello Everyone, We have been looking into our placements recently and see that one of our blog posts shows on page #5 for a popular keyword phrase with a lot of search volume. Lets say the keyword is "couples fitness ideas" We show on page 5 for a post /couples-fitness-ideas-19-tips-and-expert-advice/ We want to try and get on the first page for that phrase and wanted to know if it is better if we did one of the following: 1. Create a new page with over 100 ideas with a few more thousands of words. with a new url (thinking /couples-fitness-ideas) 2. Create a new page with a new url (thinking /couples-fitness-ideas) with the same content as the currently ranking post. We would want to do this for more freedom with layout and design of the page rather than our current blog post template. Add more content, let's say 100 more ideas. Then forward the old URL to the new one with a 301 redirect. 3. Add more content to the existing post without changing the layout and change the URL. Look forward to your thoughts
On-Page Optimization | | MobileCause0 -
WordPress image urls - need a WP maven
We were having a conversation re urls that are indexed for images that are stored in various media plugins in WP. My question for anyone who is an uberWP person is: What is your opinion re best media storage plugins and how these URLs affect pages on a site for ranking, etc. I realize this is broad, but it is driven out of my concern that I cannot touch everything. When I see a url like this: http://www.drumbeatmarketing.net/wp-content/themes/drumbeat2/img/DB-LOGO-White.png I know there is no way with all the sites and clients we handle that I can get it perfect but this just bugs me for some reason. Should I just chill since it (seemingly) affects so little....?
On-Page Optimization | | RobertFisher1 -
Opencart category urls
Hi, I have a problem with the category urls in Opencart. I have duplicate page content because of this: www.mydomain.com/category and www.mydomain.com/category?page=1 are with same content. There is also a very new problem, there are new urls - autogenerated like this. www.mydomain.com/category/category?page1 These three urls are with same content and title. I tried with 301 redirect like this: RewriteRule ^category/category?page1$ www.mydomain.com/category [L,R=301] but it doesnt work. Pls help me.
On-Page Optimization | | ankali0 -
URL Extensions (with or without??!!)
Hello, SEOers~ Today I have a question about URL extensions. Which one is more search engine friendly between URL with extensions and without extensions? e.g. URL with extension : www.example.com/tv/lcd.jsp URL without extension : www.example.com/tv/lcd I heard that URL without extensions is in trend considering user experience. User experience is also important but I would like to know from SEO perspective. Please people~ Help me out with this~! Thanks.
On-Page Optimization | | Artience0