ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why isn't there a browser tab title AND meta title?
Personal opinion; as a user, it makes sense for me to want a full 50+ character meta title which displays in a search engine that helps me determine if I want to click that link AND a concise browser tab title that tells me which page and brand I have open. As a search engine, I would (possibly wrongly) suppose that having one more piece user-facing of information would be helpful in understanding a page and that page's relation to the rest of the website. Theoretical example Meta title: A great title for the website I've been dreaming of! | OurBrand Browser tab title: Home | OurBrand
Intermediate & Advanced SEO | | sb10300 -
Something happened within the last 2 weeks on our WordPress-hosted site that created "duplicates" by counting www.company.com/example and company.com/example (without the 'www.') as separate pages. Any idea what could have happened, and how to fix it?
Our website is running through WordPress. We've been running Moz for over a month now. Only recently, within the past 2 weeks, have we been alerted to over 100 duplicate pages. It appears something happened that created a duplicate of every single page on our site; "www.company.com/example" and "company.com/example." Again, according to our MOZ, this is a recent issue. I'm almost certain that prior to a couple of weeks ago, there existed both forms of the URL that directed to the same page without be counting as a duplicate. Thanks for you help!
Intermediate & Advanced SEO | | wzimmer0 -
Website with only a portion being 'mobile friendly' -- what to tell Google?
I have a website for desktop that does a lot of things, and have converted part of it do show pages in a mobile friendly format based on the users device. Not responsive design, but actual diff code with different formatting by mobile vs desktop--but each still share the same page url name. Google allows this approach. The mobile-friendly part of the site is not as extensive as desktop, so there are pages that apply to the desktop but not for mobile. So the functionality is limited some for mobile devices, and therefore some pages should only be indexed for desktop users. How should that page be handled for Google crawlers? If it is given a 404 not found for their mobile bot will Google properly still crawl it for the desktop, or will Google see that the url was flagged as 'not found' and not crawl it for the desktop? I asked a similar question yest, but it was not stated clearly. Thanks,Ted
Intermediate & Advanced SEO | | friendoffood0 -
Is there a downside of an image coming from the site's dotted quad and can it be seen as a duplicate?
Ok the question doesn't fully explain the issue. I just want some opinions on this. Here is the backstory. I have a client with a domain that has been around for a while and was doing well but with no backlinks. (Fairly low competition). For some reason they created mirrors of their site on different urls. Then their web designer built them a test site that was a copy of their site on the web designer's url and didn't bother to noindex it. Client's site dived, the web designer's site started ranking for their keywords. So we helped clean that up, and they hired a brand new web designer and redesigned the site. For some reason the dotted quad version of the site started showing up as a referer in GA. So one image on the site comes from that and not the site's url. So I ran a copyscape and site search and discovered the dotted quad version like 69.64.153.116 (not the actual address) was also being indexed by the search engine. To us this seems like a cut and dry duplicate content issue, but I'm having trouble finding much written on the subject. I raised the issue with the dev, and he reluctantly 301 the site to the official url. The second part of this is the web designer still has that one image on the site coming from the numerical version of the site and not the written url. Any thoughts if that has any negative SEO impact? My thought it isn't ideal, but it just looks like an external referral for pulling that one image. I'd love any thoughts or experience on a situation like this.
Intermediate & Advanced SEO | | BCutrer0 -
Don't affiliate programs have an unfair impact on a company's ability to compete with bigger businesses?
So many coupon sites and other websites these days will only link to your website if you have a relationship with Commission Junction or one of the other large affiliate networks. It seems to me that links on these sites are really unfair as they allow businesses with deep pockets to acquire links unequitably. To me it seems like these are "paid links", as the average website cannot afford the cost of running an affiliate program. Even worse, the only reason why these businesses are earning a link is because they have an affiliate program; that to me should violate some sort of Google rule about types and values of links. The existence of an affiliate program as the only reason for earning a link is preposterous. It's just as bad as paid link directories that have no editorial standards. I realize the affiliate links are wrapped in CJ's code, so that mush diminish the value of the link, but there is still tons of good value in having the brand linked to from these high authority sites.
Intermediate & Advanced SEO | | williamelward0 -
How does Google determine 'top refeferences'?
Does anyone have any insight into how Google determines 'top references' from medical websites?
Intermediate & Advanced SEO | | nicole.healthline
For example, if you search 'skin disorders,' you'll see 'Sources include <cite>nih.gov</cite>, <cite>medicinenet.com</cite> and <cite>dmoz.org</cite>'--how is that determined?0 -
My warning report says I have too many on page links - 517! I can't find 50% of them but my q is about no follow
if we put 'no follow' on some of these links does that mean the search engines won't index the no follow pages even if those pages are linked to from elsewhere? no link juice will flow from the page with the (no follow) links on? Just trying to understand why my rankings have dropped so dramatically in the last 6 weeks or so since we redesigned the site, and it might be that now we have too many links on the homepage. This is the page http://www.suffolktouristguide.com/ All suggestions appreciated!
Intermediate & Advanced SEO | | SarahinSuffolk0 -
My Google title isn't showing what is entered
Help! On Yahoo and Bing, if you search "Chant Real Estate" the full title that I've entered appears in the search listings: Chant PA Real Estate | Real Estate PA | Pennsylvania: Find PA Homes for Sale But in Google it only shows "Chant PA Real Estate". This title was what the original developer used for their site and it's been almost a year now that it's been under our control. Any suggestions? The URL is www.chantre.com.
Intermediate & Advanced SEO | | gXe0