ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why doesn't my website crawl by Google?
Hi mozzers and members, I am having issues, why my website: http://profilecosmeticsurgery.com/ crawl by Google? let me share more clearly when this starts happening. A month or around 45 days back our website is being indexed and crawled quite well without any issues with having .html extension pages with static built website.
Intermediate & Advanced SEO | | SEOOOOOoooooooo
We finally thought to change to .php version and make whole website and its pages to be treated dynamically.
Once we changed all changes, thereafter this issues started. It has been more than 45 days, our website isn't being crawled since then. I didn't know what are the things preventing this to? Please help. Thanks in Advance Capture1.PNG0 -
Community Discussion - What's the ROI of "pruning" content from your ecommerce site?
Happy Friday, everyone! 🙂 This week's Community Discussion comes from Monday's blog post by Everett Sizemore. Everett suggests that pruning underperforming product pages and other content from your ecommerce site can provide the greatest ROI a larger site can get in 2016. Do you agree or disagree? While the "pruning" tactic here is suggested for ecommerce and for larger sites, do you think you could implement a similar protocol on your own site with positive results? What would you change? What would you test?
Intermediate & Advanced SEO | | MattRoney2 -
Establishing if links are 'nofollow'
Wonder if any of you guys can tell me if there is any other way to tell google links are nofollow other than in the html (ie can you tell google to nofollow every link in a subdomain or something). I'm trying to establish if a couple of links on a very high ranking site are passing me pagerank or not without asking them directly and looking silly! Within the source code for the page they are NOT tagged as nofollow at present. Hope that all makes sense 😉
Intermediate & Advanced SEO | | mat20150 -
Google can't access/crawl my site!
Hi I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings. [URL Errors: 1st photo] 8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up. The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages. After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked. Also when i go to WMT, and try to Fetch as Google the site, this is what i get: [Fetch as Google: 2nd photo] From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles). What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings? Thanks a lot
Intermediate & Advanced SEO | | granitgash
Granit FvhvDVR.png dKx3m1O.png0 -
Why does old "Free" site ranks better than new "Optimized" site?
My client has a "free" site he set-up years ago - www.montclairbariatricsurgery.com (We'll call this the old site) that consistently outranks his current "optimized" (new) website - http://www.njbariatricsurgery.com/ The client doesn't want to get rid of his old site, which is now a competitor, because it ranks so much better. But he's invested so much in the new site with no results. A bit of background: We recently discovered the content on the new site was a direct copy of content on the old site. We had all copy on new site rewritten. This was back in April. The domain of the new site was changed on July 8th from www.Bariatrx.com to what you see now - www.njbariatricsurgery.com. Any insight you can provide would be greatly appreciated!!!
Intermediate & Advanced SEO | | WhatUpHud0 -
Site was moved, but still exists on the old server and is being outranked for it's own name
Recently, a client went through a split with a business partner, they both had websites on the same domain, but within their own sub directories. There is a main landing page, which links to both sites, the landing page sits on the root. Ie. example.com is a landing page with links to example.com/partner1, and example.com/partner2 Parter 2 will be my client for this example. After the split, partner 2 downloaded his website, and put it up on his own server, but no longer has any kind of access to the old servers ftp, and partner 1 is refusing to cooperate in any way to have the site removed from the old server. They did add a 301 redirect for the home page on the old server for partner 2, so, example.com/partner2/index.html is 301'ing to the new site on the new server, HOWEVER, every other page is still live on that old server, and is outranking the new site in every instance. The home page is also being outranked, even with the 301 redirect in place. What are some steps I can take to rectify this? The clients main concern is that this old website, containing the old partners name, is outranking him for his own name, and the name of his practice. So far, here's what i've been thinking: Since the site has poor on-page optimization, i'll start be cleaning all of that up. I'll then optimize the home page to better depict the clients name and practice through proper usage of heading tags, titles, alt, etc, as well as the meta title and description. The only other thing I can think of would be to start building some backlinks? Any help/suggestions would be greatly appreciated! Thanks.
Intermediate & Advanced SEO | | RCDesign740 -
When I type link:mydomainname.com in Google I don't see any result, why?
My other website is 4 years old and Page Rank 3. We are into business of design and development for 5 years and still we don't have any result from Google Searches. When I type link:mydomainname.com I don't get any result. What's the reason?
Intermediate & Advanced SEO | | vikaspooja1 -
It appears that Googlebot Mobile will look for mobile redirects from the desktop site, but still use the SEO from the desktop site.
Is the above statement correct? I've read that its better to have different SEO titles & descriptions for mobile sites as users search differently on mobile devices. I've also read it's good to link build, keep text content on mobile sites etc to get the mobile site to rank. If I choose to not have titles & descriptions on my mobile site will Google just rank our desktop version & then redirect a user on a mobile device to our mobile site or should I be adding in titles & descriptions into the mobile site? Thanks so much for any help!
Intermediate & Advanced SEO | | DCochrane0