Odd crawl test issues
-
Hi all, first post, be gentle...
Just signed up for moz with the hope that it, and the learning will help me improve my web traffic. Have managed to get a bit of woe already with one of the sites we have added to the tool. I cannot get the crawl test to do any actual crawling. Ive tried to add the domain three times now but the initial of a few pages (the auto one when you add a domain to pro) will not work for me.
Instead of getting a list of problems with the site, i have a list of 18 pages where it says 'Error Code 902: Network Errors Prevented Crawler from Contacting Server'. Being a little puzzled by this, i checked the site myself...no problems. I asked several people in different locations (and countries) to have a go, and no problems for them either. I ran the same site through Raven Tool site auditor and got some results. it crawled a few thousand pages. I ran the site through screaming frog as google bot user agent, and again no issues. I just tried the fetch as Gbot in WMT and all was fine there.
I'm very puzzled then as to why moz is having issues with the site but everyone is happy with it. I know the homepage takes 7 seconds to load - caching is off at the moment while we tweak the design - but all the other pages (according to SF) take average of 0.72 seconds to load.
The site is a magento one so we have a lengthy robots.txt but that is not causing problems for any of the other services. The robots txt is below.
Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /catalog/product
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=Pagnation
Disallow: /?dir=
Disallow: /&dir=
Disallow: /?mode=
Disallow: /&mode=
Disallow: /?order=
Disallow: /&order=
Disallow: /?p=
Disallow: /&p=If anyone has any suggestions then please i would welcome them, be it with the tool or my robots. As a side note, im aware that we are blocking the individual product pages. Too many products on the site at the moment (250k plus) which manufacturer default descriptions so we have blocked them and are working on getting the category pages and guides listed. In time we will rewrite the most popular products and unblock them as we go
Many thanks
Carl
-
Thanks for the hints re the robots, will tidy that up.
-
Network errors can be somewhere between us and your site and not necessarily directly with your server itself. The best bet would be to check with your ISP for any connectivity issues to your server. Since your issues are only the first time they are reported, the next crawl may be more successful.
One thing though you will want to keep your user-agent directives in a single block of code without spaces.
so
Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/would need to look like:
Crawlers Setup
User-agent: *
Directories
Disallow: /ajax/
Disallow: /404/
Disallow: /app/ -
Many thanks for the reply. The server we use is a dedicated server which we set up ourselves inc OS and control panel. Just seems very odd that every other tool is working fine etc but moz won't. I cannot see how it would need anything special from, say, Raven's site crawler.
I will check out those other threads though to see if i missed anything, thanks for the links.
Just checked port 80 using http:// www.yougetsignal. com/tools/open-ports/ (not sure if links allowed) and no problems there.
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Limit MOZ crawl rate on Shopify or when you don't have access to robots.txt
Hello. I'm wondering if there is a way to control the crawl rate of MOZ on our site. It is hosted on Shopify which does not allow any kind of control over the robots.txt file to add a rule like this: User-Agent: rogerbot Crawl-Delay: 5 Due to this, we get a lot of 430 error codes -mainly on our products- and this certainly would prevent MOZ from getting the full picture of our shop. Can we rely on MOZ's data when critical pages are not being crawled due to 430 errors? Is there any alternative to fix this? Thanks
Moz Bar | | AllAboutShapewear2 -
How do can the crawler not access my robots.txt file but have 0 crawler issues?
So I'm getting this errorOur crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster.https://www.evernote.com/l/ADOmJ5AG3A1OPZZ2wr_ETiU2dDrejywnZ8kHowever, Moz is saying I have 0 Crawler issues. Have I hit an edge case? What can I do to rectify this situation? I'm looking at my robots.txt file here: http://www.dateideas.net/robots.txt however, I don't see anything that woudl specifically get in the way.I'm trying to build a helpful resource from this domain, and getting zero organic traffic, and I have a sinking suspicion this might be the main culprit.I appreciate your help!Thanks! 🙂
Moz Bar | | will_l0 -
Cannot Crawl ... 612 : Page banned by error response for robots.txt.
I tried to crawl www.cartronix.com and I get this error: 612 : Page banned by error response for robots.txt. I have a robots.txt file and it does not appear to be blocking anything www.cartronix.com/robots.txt Also, Search Console is showing "allowed" in the robots.txt test... I've crawled many of our other sites that are similarly set up without issue. What could the problem be?
Moz Bar | | 1sixty80 -
Duplicate Page and Title Issues
On the last crawl, we received errors for duplicate page titles and some duplicate content pages. Here is the issue: We went through our page titles that were marked as duplicate and changed them to make sure their titles were different. However, we just received a new crawl this week and it is saying there are even more duplicate page title errors detected than before. We're wondering if this is a problem with just us or if it has been happening to other Moz users. As for the duplicate content pages, what is the best way to approach this and see what content is being looked at as a "duplicate" set?
Moz Bar | | Essential-Pest0 -
Correcting a 4xx on my crawl report
How can I correct a 4xx error on my crawl report. This page no longer exists. What can I do?
Moz Bar | | henne0 -
Crawl Test
Hello, Does the Crawl Test having some issues at the moment. It seems so slow. I submitted a website to crawl test 3-4 days ago and still its in progress. This usually only takes 24hrs max. THanks.
Moz Bar | | lueka0 -
Prioritising campaign issues
HI Guys, Im just going through the data from our campaign I and I see we have the following. ** Too Many On-Page Links 10 000** Duplicate Page Title 8700 Duplicate Page Content 8000 Missing Meta Description Tag 1800 In terms of remedying, what do I need to prioritise? For instance does google penalise you more for duplicate URLs or more for too many links on page links? I look forward to hearing from you
Moz Bar | | Hardley1110 -
Rel Can notice issue on my SEOMoz reporting
Need some help understanding this report... I have 17 notices for Rel Can on my campaign. Then, it lists all the links. But what is this report actually telling me? Is it telling me that Rel Can's are listed on these pages? The are all blog posts...our blog was redirected when the site was recently rebuilt. I just need to understand what the report is really telling me to do/not do. Or is it ok to ignore this "notice"?
Moz Bar | | cschwartzel0