20 x '400' errors in site but URLs work fine in browser...
-
Hi, I have a new client set-up in SEOmoz and the crawl completed this morning... I am picking up 20 x '400' errors, but the pages listed in the crawl report load fine... any ideas?
example -
-
Most major robots obey crawl delays. You could check your errors in Google Webmaster Tools to see if your site is serving a lot of error pages when Google crawls.
I suspect Google is pretty smart about slowing down its crawl rate when it encounters too many errors, so it's probably safe to not include a crawl delay for Google.
-
Sorry, one last question.
Do I need to add a similar delay for Google Bots, or is this issue specifically a Roger Bot problem?
Thanks
-
Fantastic, thanks, Cyrus and Tampa, prevented many more hours of scratching head!!!
-
Hi Justin,
Sometimes when rogerbot crawls a site, the servers and/or the content management system can get overwhelmed if roger is going to fast, and this causes your site to deliver error pages as roger crawls.
If the problem persists, you might consider installing a crawl delay for roger in your robots.txt file. It would look something like this:
User-agent: rogerbot
Crawl-delay: 5This would cause the SEOmoz crawlers to wait 5 seconds before fetching each page. Then, if the problem still persists, feel free to contact the help team at [email protected]
Hope this helps! Best of luck with your SEO!
-
Thanks Tampa SEO, good advice.
Interestingly, the URL listed in SEOmoz is as follows:
www.morethansport.co.uk/brand/adidas?sortDirection=ascending&sortField=Price&category=sport and leisure
But when I look at the link in the referring page it is as follows:
/brand/adidas?sortDirection=ascending&sortField=Price&category=sport%20and%20leisure
notice the "%" symbol instead of the spaces.
The actual URL is the one listed in SEOmoz but even if I copy and paste the % version, the browser removed the '%' and the page loads fine.
I still can't get the site to throw-up a 400.
-
Just ran the example link that you provided through two independent HTTP response code checkers, and both are giving me a 200 response, i.e. the site is OK.
This question has been asked before on here; you're definitely not the first person to run into the issue.
One way to diagnose what's going on is to dig a little deeper into the crawling report that SEOmoz generated. Download the CSV file and look at the referring link, i.e. on which page Roger found the link. Then go to that page and look if your CMS is doing anything weird with the way it outputs the links that you create. I recall someone back in December having the same issue and eventually resolved it by noticing that his CMS put all sort of weird slashes (i.e. /.../...) into the link.
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's the point of my blog?
My website, www.toplinecomms.com has a reasonably good blog that gets quite good interaction and sharing. I introduced the blog at the start of 2013 because the general sentiment from all the SEO books and articles I had read was that a good blog could be invaluable to a search marketing campaign. The posts on the blog are keyword optimised and they get great shares and social engagement. However, I have noticed that the blog is stealing my services' pages' thunder! There are some keywords that I am keen for our services pages to rank for, but the blog is beating them to it! So my question is: How should I be using my blog to get my services pages to rank higher?
On-Page Optimization | | HeatherBakerTopLine0 -
SEO and multilanguage site
Hi all! I have used a wordpress plugin called WPML which translates a webpage into another language so that I have a webpage in two different languages (spanish (main market) and english). I'm just doing the seo for the spanish market and I'm gonna start with the seo for the english one. Should I do it just the same as I had a one-single-language page? just with english keywords, etc. I guessit would only differ in the way I do the linkbuilding strategy as the markets are different Thanks
On-Page Optimization | | juanmiguelcr0 -
The correct way to go from PHP site to HTML site?
I have a website fully coded in PHP and I am doing a re-design over to an HTML site. I searched through the Q&A and there were some conflicting answers. Some said you will need to 301 all the pages. Others said to use the .htaccess to parse all the files as html. What is the correct way I should go about this? Thanks in advance!
On-Page Optimization | | reliabox0 -
Mixing hyphens and underscores in a url
Hello. I am working on a site that was built with underscores in the urls, but only in the page names, not in the subdirectories. All the subdirectories have one-word names. So a typical url is "example.com/sub1/sub2/page_name." We would like to change the name of one of the subdirectories to a name that would be very useful for SEO, but this new name is a hyphenated word, let's call it "new-sub." If we changed "sub2" to "new-sub" then our url would have a mix of underscores and hyphens: example.com/sub1/new-sub/page_name. But if I used "new_sub" instead, google would read the words as connected with an underscore, instead of reading the subdirectory as a hyphenated word, which would be less useful for SEO. It seems like it might be a problem to have a hyphen in a subdirectory and underscores in the page names. But I want the SEO value of the hyphenated word. Any recommendations? Thank you!
On-Page Optimization | | nyc-seo0 -
Should I remove the Jetpack Plugin From A SIte
I dont know if anyone has any experience with the jetpack plugin, but personally I prefer yoast. My point is someones site I am looking at has both Yoast SEO plugin and also Jetpack for wordpress, should I just remove the jetpack as it seems to be a very heavy loading plugin.
On-Page Optimization | | propertyhunter0 -
Large Site - Advice on Subdomaining
I have a large news site - over 1 million pages (have already deleted 1.5 million) Google buries many of our pages, I'm ready to try subdomaining http://bit.ly/dczF5y There are two types of content - news from our contributors, and press releases. We have had contracts with the big press release companies going back to 2004/5. They push releases to us by FTP or we pull from their server. These are then processed and published. It has taken me almost 18 months, but I have found and deleted or fixed all the duplicates I can find. There are now two duplicate checking systems in place. One runs at the time the release comes in and handles most of them. The other one runs every night after midnight and finds a few, which are then handled manually. This helps fine-tune the real-time checker. Businesses often link to their release on the site because they like us. Sometimes google likes this, sometimes not. The news we process is reviews by 1,2 or 3 editors before publishing. Some of the stories are 100% unique to us. Some are from contributors who also contribute to other news sites. Our search traffic is down by 80%. This has almost destroyed us, but I don't give up easily. As I said, I've done a lot of projects to try to fix this. Not one of them has done any good, so there is something google doesn't like and I haven't yet worked it out. A lot of people have looked and given me their ideas, and I've tried them - zero effect. Here is an interesting and possibly important piece of information: Most of our pages are "buried" by google. If I dear, even for a headline, even if it is unique to us, quite often the page containing that will not appear in the SERP. The front page may show up, an index page may show up, another strong page pay show up, if that headline is in the top 10 stories for the day, but the page itself may not show up at all - UNTIL I go to the end of the results and redo the search with the "duplicates" included. Then it will usually show up, on the front page, often in position #2 or #3 According to google, there are no manual actions against us. There are also no notices in WMT that say there is a problem that we haven't fixed. You may tell me just delete all of the PRs - but those are there for business readers, as they always have been. Google supposedly wants us to build websites for readers, which we have always done, What they really mean is - build it the way we want you to do it, because we know best. What really peeves me is that there are other sites, that they consistently rank above us, that have all the same content as us, and seem to be 100% aggregators, with ads, with nothing really redeeming them as being different, so this is (I think) inconsistent, confusing and it doesn't help me work out what to do next. Another thing we have is about 7,000+ US military stories, all the way back to 2005. We were one of the few news sites supporting the troops when it wasn't fashionable to do so. They were emailing the stories to us directly, most with photos. We published every one of them, and we still do. I'm not going to throw them under the bus, no matter what happens. There were some duplicates, some due to screwups because we had multiple editors who didn't see that a story was already published. Also at one time, a system code race condition - entirely my fault, I am the programmer as well as the editor-in-chief. I believe I have fixed them all with redirects. I haven't sent in a reconsideration for 14 months, since they said "No manual spam actions found" - I don't see any point, unless you know something I don't. So, having exhausted all of the things I can think of, I'm down to my last two ideas. 1. Split all of the PRs off into subdomains (I'm ready to pull the trigger later this week) 2. Do what the other sites do, that I believe create little value, which is show only a headline and snippet and some related info and link back to the original page on the PR provider website. (I really don't want to do this) 3. Give up on the PRs and delete them all and lose another 50% of the income, which means releasing our remaining staff and upsetting all of the companies and people who linked to us. (Or find them all and rewrite them as stories - tens of thousands of them) and also throw all our alliances under the bus (I really don't want to do this) There is no guarantee this is the problem, but google won't tell me, the google forums are crap, and nobody else has given me an idea that has helped. My thought is that splitting them off into subdomains will have a number of effects. 1. Take most of the syndicated content onto subdomains, so its not on the main domain. 2. Shake up the Domain Authority 3. Create a million 301 redirects. 4. Make it obvious to the crawlers what is our news and what is PRs 5. make it easier for Google News to understand Here is what I plan to do 1. redirect all PRs to their own subdomain. pn.domain.com for PRNewswire releases bw.domain.com for Businesswire releases etc 2. Fix all references so they use the new subdomain Here are my questions - and I hope you may see something I haven't considered. 1. Do you have any experience of doing this? 2. What was the result 3. Any tips? 4. Should I put PR index pages on the subdomains too? I was originally planning to keep them on the main domain, with the individual page links pointing to the actual release on the subdomain. Obviously, I want them only in one place, but there are two types of these index pages. a) all of the releases for a particular PR company - these certainly could be on the subdomain and not on the main domain b) Various category index pages - agriculture, supermarkets, mining etc These would have to stay on the main domain because they are a mixture of different PR providers. 5. Is this a bad idea? I'm almost out of ideas. Should I add a condensed list of everything I've done already? If you are still reading, thanks for hanging in.
On-Page Optimization | | loopyal0 -
Long Url but makes no sense
Hi Just joined. Crawl states that I am getting a lot of errors, looks like the spider is getting confused and looping back on itself ? Is there a way to see where the crawl was formulated (ie where from) ? It is generating urls like: http://www.wickman.net.au/wineauction/wine_auction_alert.aspx/auction/auction/auction/auction/auction/auction/Default.aspx from http://www.wickman.net.au/wineauction/wine_auction_alert.aspx
On-Page Optimization | | blinkybill0 -
Importance of URL Structure
We are trying to restructure our onpage SEO and want to make sure we have our URLs correct. The problem is we did the URLs incorrectly in the first place and the ones we currently have are several years olds. We have some URLs such as: http://www.firebrandtraining.co.uk/courses/management/prince2.asp and
On-Page Optimization | | RobertChapman
http://www.firebrandtraining.co.uk/courses/cisco/ccna_2007.asp which are not ideal but user experience aside does it make sense for us to change the URLs and use 301 redirects to the new ones or is the damage done to our natural rankings simply not worth making the change? I have read different articles saying different things, some say that URL structure has little weight (if any weight at all) on rankings while other people seem to say it is quite important. In addition we have heard that changing the URLs with a 301 redirect will cause a large drop in ranking which will take months to recover from and contrarily that 301s are now considered "ok" by Google and we shouldn't see too much change at all in our rankings. Any advice would be much appreciated.0