Screaming frog Advice
-
Hi
I am trying to crawl my site and it keeps crashing.
My sys admins keeps upgrading the virtual box it sits on and it now currently has 8GB of memory, but still crashes.
It gets to around 200k pages crawl and dies.
Any tips on how I can crawl my whole site, can u use screaming frog to crawl part of a site.
Thanks in advance for any tips.
Andy
-
Thanks, I tried all the tips on the screaming frog site, but I have just tried to 2 pages a second and lets hope that work.
-
Hi Andy. There are quite a few settings you can adjust to make the server load less while the crawl is running. These can be found with descriptions here: http://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/
For example, by not checking Images, CSS, SWF, and Javascript you'll be able to lessen load substantially, or if you'd like to crawl just a portion of the site you can set it to not check links outside of the start folder.
To have even more control over the crawl, you can use regular expressions to exclude certain pages, or sections that match a given pattern. The page above is fairly robust, so it should help you dial back the crawler to be friendlier to your server. Cheers!
-
Hey there mate,
Sorry to hear that you are having issues. You can actually ask Screaming Frog to use more RAM. If you haven't done that yet please give it a go.
You can find more here http://www.screamingfrog.co.uk/seo-spider/user-guide/general/
If you want to crawl part of your site it can surely do that. You can exclude pages or whole sections.
Find more here http://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Suggested Screaming Frog configuration to mirror default Googlebot crawl?
Hi All, Does anyone have a suggested Screaming Frog (SF) configuration to mirror default Googlebot crawl? I want to test my site and see if it will return 429 "Too Many Requests" to Google. I have set the User Agent as Googlebot (Smartphone). Is the default SF Menu > Configuration > Speed > Max Threads 5 and Max URLs 2.0 comparable to Googlebot? Context:
Intermediate & Advanced SEO | | gravymatt-se
I had tried NetPeak SEO Spider which did a nice job and had a cool feature that would pause a crawl if it got to many 429. Long Story short, B2B site threw 429 Errors when there should have been no load on a holiday weekend at 1:00 AM.0 -
[Advice] Dealing with an immense URl structure full of canonicals with Budget & Time constraint
Good day to you Mozers, I have a website that sells a certain product online and, once bought, is specifically delivered to a point of sale where the client's car gets serviced. This website has a shop, products and informational pages that are duplicated by the number of physical PoS. The organizational decision was that every PoS were supposed to have their own little site that could be managed and modified. Examples are: Every PoS could have a different price on their product Some of them have services available and some may have fewer, but the content on these service page doesn't change. I get over a million URls that are, supposedly, all treated with canonical tags to their respective main page. The reason I use "supposedly" is because verifying the logic they used behind canonicals is proving to be a headache, but I know and I've seen a lot of these pages using the tag. i.e: https:mysite.com/shop/ <-- https:mysite.com/pointofsale-b/shop https:mysite.com/shop/productA <-- https:mysite.com/pointofsale-b/shop/productA The problem is that I have over a million URl that are crawled, when really I may have less than a tenth of them that have organic trafic potential. Question is:
Intermediate & Advanced SEO | | Charles-O
For products, I know I should tell them to put the URl as close to the root as possible and dynamically change the price according to the PoS the end-user chooses. Or even redirect all shops to the main one and only use that one. I need a short term solution to test/show if it is worth investing in development and correct all these useless duplicate pages. Should I use Robots.txt and block off parts of the site I do not want Google to waste his time on? I am worried about: Indexation, Accessibility and crawl budget being wasted. Thank you in advance,1 -
Penguin 2.0 update, ranking dropped. Advice needed!
Hello After another penguin 2.0 update the website i've been working on dropped in rankings,some of keywords that i ranked in #1 are now on second and third page, you can see this screenshot here http://screencast.com/t/MramoXgTr 95% of my competitors were not even effected with this update at all, most of them don't even optimize their website for SEO, rather they use paid directories. First thing i did is analyzed my backing profile using OSE, to my surprise i found a lot of low quality domains pointing to my pages with a keyword in anchor text. A lot of them blog commenting and low quality article directories. Since i don't have control over these links and i cant remove them i used Disavow tool to do the job. For the past 3 months, i've been doing a lot of hight quality link building; such as
Intermediate & Advanced SEO | | KentR
press releases once in 2 months, squidoo lens and hubpages 3 posts a week for each keyword, youtube video, in fact my youtube video still ranks in #3 for high competitive search, i was involved in social media, posting tweets every week and Facebook posts. I really hope that someone can help me here with a good advice on getting my rankings back here's my website, let me know what do you think about it. Thank You0 -
Silo Architecture - need an expert's advice
I understand the concept of silo architecture. What I don't understand is how to build the site navigation. I see experts talking about silos, but their sites have pervasive top level navigation. In theory, your top level nav breaks your silos. If I have 20 pages of supporting content all linked to my silo page, and the top nav is on the supporting content pages, then those pages all link to the pages in the top nav - silo broken, and link juice diluted. it would seem to me that the only way to build a true silo is to strip out all of the navigation on a supporting page, and only have it link to: 1. The silo landing page 2. Other supporting pages in the silo. is this what Bruce Clay does? I've seen Rand's lectures on silos as well. Is this what he is doing? I recently saw a video by the Network Empire team, and they'd also have a pervasive nav. Can someone please explain this?
Intermediate & Advanced SEO | | CsmBill0 -
Please Review and Advice!
My site is WordPress Solution Please expert give me some guideline how can i improve my Website's SEO
Intermediate & Advanced SEO | | shakiel0 -
Advice on forum links
Hi guys, Looking for some good advice on forum links and there potential negative impact. I am analysing the links of a URL and around 60% of the links are coming from a forum (on a different domain). The forum is very relevant - about the same product he is selling and also has a decent user base. This 60% of links account for roughly 6,500 links. All with different varying keyword anchor text's, and some with excessive usage of a particular keyword anchor text. They are also all do-follow. They are in a mixture of signature links and in post links. The site they link to has been hit by penguin which also has an EMD. MY question is even though these links are relevant and on a good site with good traffic, do you think they have likely been picked up in the penguin algorithm? My initial thought was yes only because they are all do follow and mostly keyword based. But id love to hear thoughts on this as well as possible recovery options, i.e should he remove the forum links, reduce drastically or make them all no follow so traffic can still pass through? Thanks!
Intermediate & Advanced SEO | | ROIcomau0 -
Need some urgent Panda advice. Open discussion about recovering from the Panda algorithm.
I have a site that has been affected by Panda, and I think I have finally found the problem. When I created this site in the year 2006, I bought content without checking it. Recently, when I went through the site I found out that this content had many duplicates around the web. Not 100% exact, but close to. The first thing I did is ask my best writer to rewrite these topics, as they are a must on my site. This is a very experienced writer, and she will make the categories and subpages outstanding. Second thing I did was putting a NOINDEX, FOLLOW robots meta in place for the pages I determined being bad. They haven't been de-indexed yet. Another thing I recently did is separate other languages and move these over to other domains (with 301's redirecting the old locations to the new.) This means that the site now has a /en/ directory in the URL which is no longer used. With this in mind I was thinking to relocate the NEW content, and 301 the old (to preserve the juice for a while.) For example: http://www.mysite.com/en/this-is-a-pandalized-page/ 301 to http://www.mysite.com/this-is-the-rewritten-page/ The benefits of doing this are: decreasing the amounts of directories in the URL getting rid of pages that are possibly causing trouble getting fresh pages added to the site Now, the advice I am looking for is basically this: Do you agree with the above? Or don't you agree? If you don't, please be so kind to include a reason with your answer. If you do, and have any additional information, or would like to discuss, please go ahead 🙂 Thanks, Giorgio PS: Is it proven that Panda is now a running update? Or is it still periodically executed?
Intermediate & Advanced SEO | | VisualSense1 -
Changing server today, any SEO implications or advice you may have?
Morning, We're moving our site to a new server today and having never done so before in an SEO capacity, are there any SEO implications, pitfalls, things to watch out for? Advice and comments appreciated. Thanks.
Intermediate & Advanced SEO | | Martin_S0