ScreamingFrog won't crawl my site.

FrederikTrovatten22

Hey guys,

My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.

Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspx

Is it because the products are being loaded in Javascript?
What's your recommendation?

All best,
Fred.

whiteonlySEO

Hi,

Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?

Our dev site is: https://msc-nop.com

Our regular site is: https://medicalscrubscollection.com

Thanks in advance!

TheeDigital

I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.

screamingfrog

Cheers @Andy & @Patrick

Hi Fred,

I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site

Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx

If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.

However, I'll leave you to verify that -

http://webcache.googleusercontent.com/search?q=cache:HBwmVULX5zYJ:www.netspiren.dk/pl/Helse-Hom%25C3%25B8opati-Allergica-Ron-serien_58721.aspx+&cd=1&hl=en&ct=clnk&gl=uk

Hope that helps!

Cheers

Dan

Andy.Drinkwater

I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently

-Andy

PatrickDelehanty

Hi there

It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:

The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
The website is using framesets. The SEO Spider does not crawl the frame src attribute.
The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.

Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.

It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.

Hope this helps! Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

ScreamingFrog won't crawl my site.

Browse Questions

Explore more categories

Related Questions

Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)

A client rebranded a few years ago and doesn't want to be associated with it's old brand name. He wishes not to appear when the old brand is searched in Google, is there something we can do?

301 redirects aren't passing value.

18,000 'Title Element is too Long' Errors

Why won't my sub-domain blog rank for my brand name in Google?

Strange situation - Started over with a new site. WMT showing the links that previously pointed to old site.

What is the best tool to crawl a site with millions of pages?

How to see which site Google views as a scraper site?