ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Mobile site crawl returns poorer results on 100% responsive site
Has anyone experienced an issue where Google Mobile site crawl returns poorer results than their Desktop site crawl on a 100% responsive website that passes all Google Mobile tests?
Intermediate & Advanced SEO | | MFCommunications0 -
301ing one site's links to another
Hi, I have one site with a well-established link profile, but no actual reason to exist (site A). I have another site that could use a better link profile (site B). In your experience, would 301 forwarding all of site A's pages to site B do anything positive for the link profile/organic search of the site B? Site A is about boating at a specific lake. Site B is about travel destinations across the U.S. Thanks! Best... Michael
Intermediate & Advanced SEO | | 945010 -
Baffled by this site's inability to rank
Hi guys, I've been working on a site for quite a while and it has a really good link profile, excellent content, no errors or penalties (as far as I can tell) but for some reason it consistently ranks below a lot of thin poor quality websites with spammy EMDs and a few obviously paid links from old-skool business directories etc. It has a significantly higher DA and linking root domains that almost all of them. Also it just bounces around from #40 to #28 to#35 to #40 to #28 on a weekly basis for many of our primary keywords. There just seems to be no logic to this and it goes against everything I know and everything we're taught. (I should probably point out that I've been doing this quite a while and have a number of other sites ranking extremely well in quite a few different verticals), Has anyone ever experienced anything like this and what did you do? Before I throw in the towel it would be good to hear from others and try and understand why this happens and if there is anything else I can try to help my client and fix it. Many thanks in advance.
Intermediate & Advanced SEO | | Blaze-Communication0 -
Building a product clients will integrate into their sites: What is the best way to utilize my clients' unique domain names?
I'm designing a hosted product my clients will integrate into their websites, their end users would access it via my clients' customer-facing websites. It is a product my clients pay for which provides a service to their end users, who would have to login to my product via a link provided by my clients. Most clients would choose to incorporate this link prominently on their home page and site nav.
Intermediate & Advanced SEO | | emzeegee
All clients will be in the same vertical market, so their sites will be keyword rich and related to my site.
Many may even be .org and ,edus The way I see it, there are three main ways I could set this up within the product.
I want to know which is most beneficial, or if I'm missing anything. 1: They set up a subdomain at their domain that serves content from my domain product.theirdomain.com would render content from mydomain.com's database.
product.theirdomain.com could have footer and/or other no-follow links to mydomain.com with target keywords The risk I see here is having hundreds of sites with the same target keyword linking back to my domain.
This may be the worst option, as I'm not sure about if the nofollow will help, because I know Google considers this kind of link to be a link scheme: https://support.google.com/webmasters/answer/66356?hl=en 2: They link to a subdomain on mydomain.com from their nav/site
Their nav would include an actual link to product.mydomain.com/theircompanyname
Each client would have a different "theircompanyname" link.
They would decide and/or create their link method (graphic, presence of alt tag, text, what text, etc).
I would have no control aside from requiring them to link to that url on my server. 3: They link to a subdirectory on mydomain.com from their nav/site
Their nav would include an actual link to mydomain.com/product/theircompanyname
Each client would have a different "theircompanyname" link.
They would decide and/or create their link method (graphic, presence of alt tag, text, what text, etc).
I would have no control aside from requiring them to link to that url on my server. In all scenarios, my marketing content would be set up around mydomain.com both as static content and a blog directory, all with SEO attractive url slugs. I'm leaning towards option 3, but would like input!0 -
Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
Our parent company has included their sitemap links in our robots.txt file. All of their sitemap links are on a different domain and I'm wondering if this will have any impact on our searchability or potential rankings.
Intermediate & Advanced SEO | | tsmith1310 -
Shouldn't Lower Bounce Rate Correlate into Greater Click Thru Rate for a Web Site?
Greetings: I run a real estate web site in New York City with about 650 pages out of which 330 are property listing pages. About 250 of those listing pages contain less than 150 words of content. In late August I set about 250 of the listing pages that generated the least traffic (generally corresponding to those with the least content) to "no-index, follow". Now Google has removed those pages from their index. The overall bounce rate for the site has been reduced from about 69% to about 64% since the removal of these low quality listing pages. However the click thru rate has not improved and is stuck at about 2.2 pages per visitor. Shouldn't the click thru rate improve if the bounce rate goes own? Am I missing something? Also, is a lower bounce rate something that Google will take into account when calculating rank? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
What can you do when Google can't decide which of two pages is the better search result
On one of our primary keywords Google is swapping out (about every other week) returning our home page, which is more transactional, with a deeper more information based page. So if you look at the Analysis in Moz you get an almost double helix like graph of those pages repeatedly swapping places. So there seems to be a bit of cannibalizing happening that I don't know how to correct. I think part of the problem is the deeper page would ideally be "longer" tail searches that contain the one word keyword that is having this bouncing problem as a part of the longer phrase. What can be done to try prevent this from happening? Can internal links help? I tried adding a link on that term to the deeper page to our homepage, and in a knee jerk reaction was asked to pull that link before I think there was really any evidence to suggest that that one new link made a positive or negative effect. There are some crazy theories floating around at the moment, but I am curious what others think both about if adding a link from a informational to a transactional page could in fact have a negative effect, and what else could be done/tried to help clarify the difference between the two pages for the search engines.
Intermediate & Advanced SEO | | plumvoice0 -
Should I 'nofollow' links between my own sites?
We have five sites which are largely unrelated but for cross-promotional purpose our company wishes to cross link between all our sites, possibly in the footer. I have warned about potential consequences of cross-linking in this way and certainly don't want our sites to be viewed as some sort of 'link ring' if they all link to one another. Just wondering if linking between sites you own really is that much of an issue and whether we should 'nofollow' the links in order to prevent being slapped with any sort of penalty for cross-linking.
Intermediate & Advanced SEO | | simon_realbuzz0