ScreamingFrog won't crawl my site.

FrederikTrovatten22

Hey guys,

My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.

Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspx

Is it because the products are being loaded in Javascript?
What's your recommendation?

All best,
Fred.

whiteonlySEO

Hi,

Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?

Our dev site is: https://msc-nop.com

Our regular site is: https://medicalscrubscollection.com

Thanks in advance!

TheeDigital

I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.

screamingfrog

Cheers @Andy & @Patrick

Hi Fred,

I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site

Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx

If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.

However, I'll leave you to verify that -

http://webcache.googleusercontent.com/search?q=cache:HBwmVULX5zYJ:www.netspiren.dk/pl/Helse-Hom%25C3%25B8opati-Allergica-Ron-serien_58721.aspx+&cd=1&hl=en&ct=clnk&gl=uk

Hope that helps!

Cheers

Dan

Andy.Drinkwater

I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently

-Andy

PatrickDelehanty

Hi there

It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:

The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
The website is using framesets. The SEO Spider does not crawl the frame src attribute.
The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.

Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.

It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.

Hope this helps! Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

ScreamingFrog won't crawl my site.

Browse Questions

Explore more categories

Related Questions

Google Mobile site crawl returns poorer results on 100% responsive site

301ing one site's links to another

Baffled by this site's inability to rank

Building a product clients will integrate into their sites: What is the best way to utilize my clients' unique domain names?

Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?

Shouldn't Lower Bounce Rate Correlate into Greater Click Thru Rate for a Web Site?

What can you do when Google can't decide which of two pages is the better search result

Should I 'nofollow' links between my own sites?