Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it possible to block Moz from crawling sites?
Hi, is it possible to stop Moz from crawling a site at the server level? Not that I am looking to do this or anything, but here's why I'm asking. I have been crawling a site that is managed (currently by 2 parties), and I noticed that this week pages crawled went from 80 (last week) to 1 page!! I know, what? See my image attached... and the issues all went to zero "0"....! So is it possible that someone can't prevent Moz from crawling the site at the server level? I checked the robots.txt file on the site, but nothing there. I'm curious. dYNUwjd.jpg
Moz Pro | | co.mc0 -
How to fix overly dynamic URLs for Volusion site?
We're currently getting over 5439 pages with an 'overly dynamic URL' warning in our Moz scan. The site is run on Volusion. Is there a way to fix this seeming Volusion error?
Moz Pro | | Brandon_Clay0 -
Working with Open Site Explorer
Hi everyone, I'm new to keyword analysis, and am in the process of consuming a lot of SEOmoz articles and resources on the subject. I wanted to see if I'm correct in my analysis of two compared sites, and hope you can shed some light on the matter. I've been to the Google Keyword Tool and looked for my informational keywords for the project I'm working on, since the user intent is all about information. A not-so-great keyword phrase I've found with 12,100 local monthly searches is: "programa de inglês" (english programme) I'm just using this as a quick example. I have performed a Google search query for the above phrase from google.com.br (Brazil), and I'm comparing the #2 and #4 results from the 1st page of the SERPs which are: #2) www.programa-ingles.net and #4) http://www.baixaki.com.br/categorias/educacao-e-diversao.htm. What's confusing me is that in Open Site Explorer, the #4 result gets a much higher page authority compared to the #2 result, and beats #2 on every category except for internal–external link ratio and all the social categories. Here's an image attached of the comparison. Is it the fact that the external links of #2 account for 100% of the links pointing to it, or that the #2 position beats (rather pitifully) #5 on social sharing, or is it something that I've not stumbled across yet? Thanks in advance for helping out a n00b. 6HU6hBi.png
Moz Pro | | featherseo0 -
Big changes in site titles
So as I pour though some of the diagnostics data for over 100,000 pages of my site I see thousands of page title that "could" be changed. Could this cause some lost traffic for a while due to the big changes?
Moz Pro | | dvduval0 -
Problem with seoMoz keyword tool and rank tracking?
hi all, so i get another problem with rank tracking. We’re unable to retrieve your ranking. and keyword difficulty tool The Keyword Difficulty tool is currently unresponsive due to difficulties with real-time rankings retrievals. We apologize for the inconvenience and are working to fix it. i seem to keep getting this error messages these past 2 days and this is affecting the numbers on my campaign (history n analysis) can someone tell me what's wrong? thanks,
Moz Pro | | BSutandio0 -
What web page and domain analysis / error checking / testing tools do you use for competitor analysis? sites like webpagetest
Just wondering what everyone is using, I am looking to get as much insight and detail as I can on websites that are not currently being monitored by me... i.e. potential clients. I use tools like pagespeed, webpagetest, loadimpact and open site explorer, google adwords, ispionage, alexa, semrush and well, looking for more. I really just want to rip a website to the tiniest pieces possible in an organized and coherent manner... is there anything out there? I have tried several other's which i no longer use (compete, 4q, woopra, nuestar, to name a few), I am not sure if I know exactly what i want, i just want more.... damn the human condition. lol
Moz Pro | | atb9900 -
Is there a tool that tracks and records your links to your site
What I mean by this is we have a linkbuilder working for us and I'm looking for to record there progress with link building I've seen somthing in Majestic but is there one in SEOMOZ All teh best Steve
Moz Pro | | ibexinternet0 -
"Rank Tracker Tool" is not agreeing with "Keyword Difficulty Tool"
I usually don't sweat a lot for ranks and such but last couple of days, our rankings have been moving drastically. 'Rank Tracker' shows 1st position for many keywords and the "keyword difficulty tool" shows 2nd and 3rd positions. Is is just me or this is a common thing?
Moz Pro | | Syed10