How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Domain and urls aren't showing up in Google search
Hi, Moz community, I hope you are staying safe, I have been trying to search our website in Google by using the whole domain name, but it's not showing up. For example: https://www.example.com/
Competitive Research | | ksmith88
https://www.example.com/inner-page.html
Or if search brand name: Example, doesn't come up But when I try example.com, it comes up along with other pages. Neither the inner pages are being come up in the search nor the home page with https://www.example.com. I have checked with Site:example.com, it is showing all the pages, but it is weird on the other hand that it is not visible in the search, what could be the reason? Any tool to check it? I thought it was because of the latest core update from Google. But, there are many keywords in the rankings, so I am sure the website hasn't been impacted. I checked penalties or issues through many tools and even in the search console, everything is fine. Any help would be appreciated.1 -
Strange google search - help me mozzers
Hey Mozzers How Are You All? Hope You Are Doing Well. 🙂 I need your help. I have been having a very bad experience with Google. 😞 Do you know what is "Google Dance"? Somewhere I have heard about this. I don't have even clearance about Google Dance. BTW, the problem is that i have an adult video site about a porns tar. I was ranking at 3-6 position on a "particular keyword" on google. But about 20 days ago when i searched my "keyword term", I could not find it anywhere even on page 50 in Google search. 😄 My heartbeat was gone faster. As i was getting 25k visitors from google of that keyword. 🙂 Then i looked at my webmaster tools to check "manual spam action". There was no issue with spam or anything related to it. Then i checked my google analytics i was getting visitors from google but not from the 'particular keyword'. I was sure it was not "google hit or penalized". I could not find the exact reason why did it happen. I checked my sites everything seems to be fine. Then i did "fetch as google" and the strange thing was happened, my site pop up again in google on "that keyword". But this not the end. From that day my search keyword was getting vanish within 24 hours or more in google. And i do google fetch and it again comes to search results. 😄 I tried moz tools. no such issue. Though i cannot sort out the whole matter. What’s wrong going with google. Its an adult site, otherwise I would provide my site url and the keyword. By the way, I have attached some screenshots. I am waiting for your help. Thanks to all. 🙂 gyKIRVW
Competitive Research | | raja290 -
How much keyword density for Google?
I have several pages on one site which have gone down during the past few months. They keyword density on those pages, which is not unnatural, pleased Google for many years. it still pleases Bing. But Google now seems very picky. Based upon your experience, what is the ideal % keyword density for 2 and 3 word phrases, and should they be left out of alt tags even when proper to put them there? While Google dominates, we do not wish to alienate BIng/Yahoo. It is a huge mystery, and experimentation with more non-keyword-related text has so far not born any fruit. Thank you, GH
Competitive Research | | gheh20130 -
How much time takes Google to Index a page in the search?
Hello all, I have created a page and published it 2 days ago and it does not appear in the Google Search.How much time can usually/on average take to index a page by Google in order to appear in the search results of a web search? I have check the Google Webmaster tools and neither no urls have been reported to be blocked nor the content should be blocked. Thank you very much, Best regards, Antonio
Competitive Research | | aalcocer20030 -
How do I know top pages (first on Google) for one website?
Hi! I'm SEOMOZ PRO user and I want to know which pages of other website are ranking on top on Google, Yahoo, Bing, ecc... with relative position (rank). Is there a features of SEOMOZ for it? Regards
Competitive Research | | jadlib0 -
How come the results in Google vary with domains
Hello, How is everyone doing? My question is about the google search engine results page. How come some results have the www. in front of them and some don't. Also what are the SEO implications of having www. in front of your search results vs. not. Is this something to do with canonical? I have included a screen shot so you will see what I mean. One result is www.gearyi.com and the result without the www is ingenexdigital.com. R6GLL.png
Competitive Research | | digitalops0 -
Image only site on top of Google
Hi Everyone, I'm trying to rank in Google for 'Hid xenon' in the netherlands, but there is one site above all results: http://bit.ly/qlsjne As you can see the site almost has no backlinks, and has not a single word in it's content, all images. it's a keyword only domain, and that's probably the only reason why it's ranking that high, but that means then that i can never get higher then him in Google because of it's domainname? Even when it's such a shitty site? Thank you, regards yannick
Competitive Research | | iwebdevnl0 -
Sometime I just don't get Google rankings
We currently rate #10 on google.com.au on Modern Cloth Nappies and the #4 site is a dead link to a page http://www.modernclothnappies.org/ who's total content is: Index of / <address>Apache Server at www.modernclothnappies.org Port 80</address> <address></address> <address>They have been at that rank for a quiet a while and even the cached version is full of broken links.</address> <address></address> <address>It seems Google is quick to jump on low value sites or ones with duplicate content, but what about stale links and sites? Has anyone else had similar experiences of being out ranked by domant or dead sites?</address>
Competitive Research | | oznappies0