ScreamingFrog won't crawl my site.
-
Hey guys,
My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages.
Examples
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspxIs it because the products are being loaded in Javascript?
What's your recommendation?All best,
Fred. -
Hi,
Thank you for this question and the responses because we encountered the same issue; Screaming Frog was only crawling a handful of products out of hundreds, because of JS. We made significant changes to the redirect rules on our dev site, and we want to make sure that the changes will not cause any crawling errors before we deploy to the live site. Is there any way to disable JS just for the purpose of a Screaming Frog crawl?
Our dev site is: https://msc-nop.com
Our regular site is: https://medicalscrubscollection.com
Thanks in advance!
-
I'm not sure if this has been fixed already, and thank you for Dan for chiming in, but I was able to crawl around 700 URLs.
-
Cheers @Andy & @Patrick
Hi Fred,
I haven't performed an extensive check, but the SEO Spider crawls around 35 URLs with /pi/ in the string, which is presumably not all the products on the site
Patrick actually mentions the issue in one of his points above. Essentially it looks like the site uses JavaScript on category pages for products, example - http://www.netspiren.dk/pl/Helse-Homøopati-Allergica-Ron-serien_58721.aspx
If you disable JS in your browser, you'll see a blank page where the products were. Our tool doesn't execute JS, although Google is much smarter and often can.
However, I'll leave you to verify that -
Hope that helps!
Cheers
Dan
-
I have sent Dan from Screaming Frog a tweet for you Fred. I'm sure he will be along presently
-Andy
-
Hi there
It's crawling for me. Here are a list of reasons why ScreamingFrog won't crawl your site:
- The site is blocked by robots.txt. A count of pages blocked by robots.txt is shown in the crawl overview pane on top right hand site of the user interface. You can configure the SEO Spider to ignore robots.txt by going to the “Basic” tab under Configuration->Spider.
- The site behaves differently depending on User Agent. Try changing the User Agent under Configuration->User Agent.
- The site requires JavaScript. Try looking at the site in your browser with JavaScript disabled.
- The site requires Cookies. Can you view the site with cookies disabled in your browser? Licenced users can enable cookies by going to Configuration->Spider and ticking “Allow Cookies” in the “Advanced” tab.
- The ‘nofollow’ attribute is present on links not being crawled. There is an option in Configuration->Spider under the “Basic” tab to follow ‘nofollow’ links.
- The page has a page level ‘nofollow’ attribute. The could be set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter.
- The website is using framesets. The SEO Spider does not crawl the frame src attribute.
- The Content-Type header did not indicate the page is html. This is shown in the Content column and should be either text/html or application/xhtml+xml.
Run through your settings and check and see if you may have turned something on inadvertently that you didn't mean to. One thing you can try, is goto Configuration > Spider and then goto the last option Ignore robots.txt. Click the checkbox and try running it again.
It could just be a slow connection on your end. Give it a few minutes and see if any of the above suggestions work.
Hope this helps! Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Baffled by this site's inability to rank
Hi guys, I've been working on a site for quite a while and it has a really good link profile, excellent content, no errors or penalties (as far as I can tell) but for some reason it consistently ranks below a lot of thin poor quality websites with spammy EMDs and a few obviously paid links from old-skool business directories etc. It has a significantly higher DA and linking root domains that almost all of them. Also it just bounces around from #40 to #28 to#35 to #40 to #28 on a weekly basis for many of our primary keywords. There just seems to be no logic to this and it goes against everything I know and everything we're taught. (I should probably point out that I've been doing this quite a while and have a number of other sites ranking extremely well in quite a few different verticals), Has anyone ever experienced anything like this and what did you do? Before I throw in the towel it would be good to hear from others and try and understand why this happens and if there is anything else I can try to help my client and fix it. Many thanks in advance.
Intermediate & Advanced SEO | | Blaze-Communication0 -
Ranking for keyword I don't optimize for & Other oddities
Hi Moz Community! I've been working with a clients website for about a year now. They were hit with the original Panda update because of some spammy links from a shady SEO firm. We've made a decent climb back but not a full recovery. There are some weird things happening that I would love some insight into. 1. Ranking for keywords we don't optimize for: I noticed some low keyword volume for a keyword term that is close to our main term, but is slightly different. We don't optimize for this term at all on our website. We rank third for this term, and actually show site links in the result, which doesn't happen for any of our other pages. 2. Index not found when doing site: search: Other oddity is that when you search site:www.mywebsite.com, I see all the pages within the site except the homepage. Not sure whats going on here, but when I fetch the homepage in GWMT, it returns the homepage. When you query the homepage by itself, it also ranks. Any help would be appreciated! Regards, J
Intermediate & Advanced SEO | | artscienceweb0 -
Why does old "Free" site ranks better than new "Optimized" site?
My client has a "free" site he set-up years ago - www.montclairbariatricsurgery.com (We'll call this the old site) that consistently outranks his current "optimized" (new) website - http://www.njbariatricsurgery.com/ The client doesn't want to get rid of his old site, which is now a competitor, because it ranks so much better. But he's invested so much in the new site with no results. A bit of background: We recently discovered the content on the new site was a direct copy of content on the old site. We had all copy on new site rewritten. This was back in April. The domain of the new site was changed on July 8th from www.Bariatrx.com to what you see now - www.njbariatricsurgery.com. Any insight you can provide would be greatly appreciated!!!
Intermediate & Advanced SEO | | WhatUpHud0 -
Other domains hosted on same server showing up in SERP for 1st site's keywords
For the website in question, the first domain alphabetically on the shared hosting space, strange search results are appearing on the SERP for keywords associated with the site. Here is an example: A search for "unique company name" shows the results: www.uniquecompanyname.com as the top result. But on pages 2 and 3, we are getting results for the same content but for domains hosted on the same server. Here are some examples with the domain name replaced: UNIQUE DOMAIN NAME PAGE TITLE
Intermediate & Advanced SEO | | Motava
ftp.DOMAIN2.com/?action=news&id=63
META DESCRIPTION TEXT UNIQUE DOMAIN NAME PAGE TITLE 2
www.DOMAIN3.com/?action=news&id=120
META DESCRIPTION TEXT2 UNIQUE DOMAIN NAME PAGE TITLE 2
www.DOMAIN4.com/?action=news&id=120
META DESCRIPTION TEXT2 UNIQUE DOMAIN NAME PAGE TITLE 3
mail.DOMAIN5.com/?action=category&id=17
META DESCRIPTION TEXT3 ns5.DOMAIN6.com/?action=article&id=27 There are more but those are just some examples. These other domain names being listed are other customer domains on the same VPS shared server. When clicking the result the browser URL still shows the other customer domain name B but the content is usually the 404 page. The page title and meta description on that page is not displayed the same as on the SERP.As far as we can tell, this is the only domain this is occurring for.So far, no crawl errors detected in Webmaster Tools and moz crawl not completed yet.0 -
I need help with a local tax lawyer website that just doesn't get traffic
We've been doing a little bit of linkbuilding and content development for this site on and off for the last year or so: http://www.olsonirstaxattorney.com/ We're trying to rank her for "Denver tax attorney," but in all honesty we just don't have the budget to hit the first page for that term, so it doesn't surprise me that we're invisible. However, my problem is that the site gets almost NO traffic. There are days when Google doesn't send more than 2-3 visitors (yikes). Every site in our portfolio gets at least a few hundred visits a month, so I'm thinking that I'm missing something really obvious on this site. I would expect that we'd get some type of traffic considering the amount of content the site has, (about 100 pages of unique content, give or take) and some of the basic linkbuilding work we've done (we just got an infographic published to a few decent quality sites, including a nice placement on the lawyer.com blog). However, we're still getting almost no organic traffic from Google or Bing. Any ideas as to why? GWMT doesn't show a penalty, doesn't identify any site health issues, etc. Other notes: Unbeknownst to me, the client had cut and pasted IRS newsletters as blog posts. I found out about all this duplicate content last November, and we added "noindex" tags to all of those duplicated pages. The site has never been carefully maintained by the client. She's very busy, so adding content has never been a priority, and we don't have a lot of budget to justify blogging on a regular basis AND doing some of the linkbuilding work we've done (guest posts and infographic).
Intermediate & Advanced SEO | | JasonLancaster0 -
SEO Tools You Can't Live Without?
Hi Guys, I'm currently in the middle of creating a comprehensive blog post covering SEO Tools that I wouldn't be able to work without. So far I've got the following down, as I use these on a day to day basis and they make my job infinitely easier. SEOMoz / OSE AHrefs BuzzStream Scrapebox Xenu / Screaming Frog Excel GWT / Analytics / Adwords Keyword Tool What tools or subscriptions do you use on a daily basis and couldn't be without?
Intermediate & Advanced SEO | | SebastianCowie2 -
Redirect micro-niche site to bigger niche site?
I have a micro niche site that performs reasonably well (page 1 at least) for it's main keywords. It is an exact match domain. To save the ongoing maintenance of a site that gets less than 10 visitors a day, I was thinking of redirecting this micro niche site to a bigger site (a niche site that the micro niche fits into, if that makes sense!) Would I lose rankings because of the power that the EMD provided? Would it be better keeping it there for the backlink it provides to the bigger site (although on the same C Class IP)
Intermediate & Advanced SEO | | BigMiniMan0 -
Can you see the 'indexing rules' that are in place for your own site?
By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed. If you can see them - how?
Intermediate & Advanced SEO | | Visually0