How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Possible google sandbox issue? Organically ranking page 1 for our number 1 keyword, but page 5 sometimes 6 on google?
What are some things I can look into to figure out why google is ranking us on page 5 sometimes page 6, with some slight rank boosts to rank 36 from 48 but then falls right back. While Yahoo and Bing rank us page 1 consistently, without these big drops back. I use google search ( webmaster tools ) daily, fix 404s and make sure to fetch new content I create. Our site is within the Sandbox issue time frame, google 1st indexed the site about a year and a half ago, the site has been through various SEO service checks, and has had those issues fixed ( some bigcommerce won't allow, such as full sitewide ssl and a few other small factors ) but all the big stuff was handled or will be 100% handled after this redesign is complete. But just still seems we're stuck in the google hole again. We do use adwords, but no clear signs as to why we'd suffer such hardships with google ranking, only thing we don't have optimized in terms of on page optimization from moz is keyword in url, of which will be changing within the next month or two, as we're rolling our a new redesign with SEO 100% at the forefront, nice url paths, with keywords in their url, much more responsive site that uses less resources. But before we release this redesign, I'd like to find out what toe we stubbed of google's to give us such a ranking blackeye... We don't have that many backlinks, and I know these are a huge factor, however, building quality backlinks it's harder than walking on water at time and on the same level as spinning hay into gold. Any ideas community...
Competitive Research | | Deacyde0 -
Google Listings EMD Bias
I've been looking at 60+ location based searches for the base two months and noticed a big issue I can't explain. I know EMD was hit hard in the general SERPs but it obviously has not effected the location SERPs. The main way I'm finding these situations is by seeing the 7 pack and it shows a site with only a quarter amount of the citations the other sites have and jumps to the top very quickly. It appears to be working because of the EMD bias in the Local SERPs algorithm. From what I understand you are not suppose to add a TLD domain into a G+ listing and then 301 redirect it to your real domain but Google doesn't seem to mind at this point. I'm wondering if this tactic is a valid Local tactic at this time or if, from what I understand, it is a shady tactic that will end up hurting brand and have a strong chance of penalizing the real domain. 2012-12-13_10-45-39.png
Competitive Research | | BenRWoodard0 -
Duplicate content for www & non-www results
why would my campaign show duplicate content entries for www & non-www versions of my url? Here's an example I have a page called 'mydomain.com/resources/', and the campaign analysis shows it as being duplicate content, with the duplicate being 'www.mydomain.com/resources'. I don't know where I can adjust this or if it is perhaps related to some other setting, like Google Analytics or something else. /G
Competitive Research | | swdmedia0 -
How do I know top pages (first on Google) for one website?
Hi! I'm SEOMOZ PRO user and I want to know which pages of other website are ranking on top on Google, Yahoo, Bing, ecc... with relative position (rank). Is there a features of SEOMOZ for it? Regards
Competitive Research | | jadlib0 -
What are the competition's Google Places pages optimised for?
I'm doing some work on a client's Googe Places page, and wondered if there's any way to see what a completitors Places page is currently optimised or categorised for? Basically, we're trying to rank for 'Bathrooms Edinburgh' and almost all of the page 1 SERP's are (unsurprisingly) full of Places results, with #1 Organic slot right down at the bottom of the page. In short - we NEED to get our Places page kicked into shape, and pronto! So, is there any way to find out how the competition's Places pages are ranking so well? e.g. What have they categorised themselves under? Cheers in advance folks, JM
Competitive Research | | JamesMio0 -
Why do i not receive google traffic?
İ have published over 3000 unique articles to pr3 drupal site over the past 3 months, yet only get about 20-30 visitors a day from google to my new 3000 articles. i have spent over 10 000usd for those articles, all range between 400-800 words and all pass copyscape. 90 percent of the articles are indexed and site pr3 site. the site is alltopics.com why do i not receive traffic?
Competitive Research | | rxesiv0 -
Is it valuable for a local business to build links into its Google Place?
G'Day All, Almost all of my clients are geo-based small service-based businesses. I've noticed during my research that the google places for our competitors in 3 separate niches (3 different clients) seem to be the dominating results for almost all relevant keyword terms. I'm curious to see if anyone has actively tried to increase the ranking of a google place by building links into it. Is this something that anyone else sees value to for a local small business? I would love to get some thoughts. And for that matter I'm also curious to see if anyone thinks there might be value to optimizing a Facebook Fan Page or Yelp Business page. They all seem to be key drivers of traffic our client websites so I'm wondering how difficult it is to make them rank as opposed to a website. Thanks!
Competitive Research | | blahblahblah20150 -
Excel Concatenate Function for Google Places?
I'm trying to expedite my research by using the concatenate function. How should the search URL be modified to trigger a google places search as opposed to a normal search. Thanks!
Competitive Research | | BlueFountainMedia0