Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Solving URL Too Long Issues
Moz.com is reporting that many URL's are to long, these particularly affect Product URL's where the URL is typically https://www.domainname.com/collections/category-name/products/product-name, (You guessed it we're using Shopify). However, we use Canonicals that ignore all most of the URL and are just structured https://www.domainname.com/product-name, so Google should be reading the Canonical and not the long-winded version. However, Moz cannot seem to spot this... does anyone else have this problem and how to solve so that we can satisfy the Moz.com crawl engine?
Moz Pro | | Tildenet0 -
New pages on my web site
I have created web sites that appear somewhere on Google in hardly any time at all, but I appear to have forgotten something or things are different for pages added recently to an existing website. I have added a page on a particular subject, optimized it using on page grader, so that I get an A, and a check mark for everything except H1 tags and rel=canonical which my web hosting provider does not support. I do have a check mark for accessible to search engines The page has the format http://www.domain.com/specific-keyword It is in the menu, so should have internal links to it, as I understand it. I have created a new site map, and submitted it in webmaster tools. Interestingly it says that of the 96 pages only 76 were indexed is this a clue? and why would they not index a page I have then shared the page on google plus, facebook, tumblr, pinterest and twitter and some others In OSE it comes up as domain authority 28 page authority 1, the social media shares do show up in metrics on the right but no links internal or external are shown, they do on other pages I created in the same way. Is it just a case of waiting or is their something I do to help thank you
Moz Pro | | singingtelegramsuk0 -
What is the Best Local Ranking Tool?
I'm trying to track down a tool that will provide localized rankings within Google Maps/Places, Yahoo Local, Bing Local as well as major local directories such as Yelp, Yellow Pages, etc. Additionally, I'm looking for the results to provide the address being displayed in the ranking. Any suggestions?
Moz Pro | | JonClark150 -
How to push negative product review sites down.
Hi Guys, One of my respected clients have some issues with negative product review sites coming up when they search their brand name on google. So for an exmaple, when I search for Company Name on google, the 3rd and 4th results are angry customer reviews. This is harming my clients brand so bad. My questions are, 1. What should I do to push these results down. I am happy to do pretty much ANYTHING to push these sites down. 2. I'm also thinking of doing a blog for this client for SEO purposes and wondering the pros and cons of having the blog as a subdomain vs subfolder. Which will help me to again, push the negative site links down. Thanks
Moz Pro | | Uds0 -
How can a site have a backlink from Barclays website?
Hi, I have entered a competitiors website www.my-wardrobe.com into Open Site to see who they get links from and to my surprise they have a load from Barclays Business Banking. When I visit the page I can not see the links. But if I search the pages source code for my-wardrobe, there I have it, a link to my-wardrobe.com. How have they done this? Surely Barclays haven't sold them it? And more so, why are they receiving link juice when you cant even see the link on the Barclays page in question - http://www.barclays.co.uk/BusinessBanking/P1242557952664 Thanks | |
Moz Pro | | YNWA
| | <a <span="">href</a><a <span="">="</a>http://www.my-wardrobe.com" class="popup" title="Link opens in a new window" rel='' onmousedown="dcsMultiTrack('DCS.dcsuri','BusinessBankingfromBarclays/Footer/wwwmywardrobecom', 'WT.ti', '','WT.dl','1');"> |
| | www.my-wardrobe.com |
| |
|
| | |0 -
Competitive Link Analysis Tool?
Hi, I ran a competitive link analysis report today and back came quite a few domains that 2 or more of my 5 main competitors link from. Is it worth me submitting links to these sites? And would i be best served submitting my homepage URL or submitting a brand page such as Creative Recreation Trainers? I want to target that brand but don't want to do it if my main URL is better? Any ideas? See below my report. | Subdomain | Subdomain mR | Subdomain mT | # Competitors | # Linking Pages | Link Acquired |
Moz Pro | | YNWA
| t.co/ | 8.05 | 8.04 | 2 | <a>2</a> | |
| ww2.cox.com/ | 5.99 | 6.50 | 2 | <a>3</a> | |
| www.littlewebdirectory.com/ | 5.90 | 5.59 | 2 | <a>2</a> | |
| www.amazines.com/ | 5.69 | 5.66 | 2 | <a>3</a> | |
| svpply.com/ | 5.66 | 5.53 | 3 | <a>20</a> | |
| www.jayde.com/ | 5.64 | 5.68 | 3 | <a>4</a> | |
| www.pearltrees.com/ | 5.58 | 5.81 | 2 | <a>2</a> | |
| www.businessseek.biz/ | 5.52 | 5.51 | 2 | <a>3</a> | |
| www.a1articles.com/ | 5.50 | 5.22 | 3 | <a>9</a> | |
| www.linksilo.de/ | 5.48 | 5.23 | 2 | <a>15</a> ||
| www.alistsites.com/ | 5.46 | 5.24 | 2 | <a>38</a> | |
| www.the-free-directory.co.uk/ | 5.37 | 5.07 | 2 | <a>20</a> | |
| www.walhello.com/ | 5.30 | 4.97 | 2 | <a>2</a> | |
| www.quarkbase.com/ | 5.14 | 5.12 | 2 | <a>2</a> | |
| snipsly.com/ | 5.13 | 5.20 | 2 | <a>21</a> | |
| www.counterdeal.com/ | 5.12 | 5.07 | 2 | <a>2</a> | |
| www.01webdirectory.com/ | 5.03 | 5.03 | 2 | <a>2</a> | |
| www.2addlink.info/ | 4.92 | 4.58 | 3 | <a>4</a> | |
| www.fuk.co.uk/ | 4.64 | 5.00 | 3 | <a>20</a> | |
| www.final-fantasy.us/ | 4.63 | 4.77 | 2 | <a>2</a> | |
| oyax.com/ | 4.42 | 4.61 | 2 | <a>4</a> | |
| www.touchretail.co.uk/ | 4.33 | 4.21 | 2 | <a>4</a> | |
| tptbtv.cold10.com/ | 4.27 | 4.86 | 3 | <a>1</a> | |
| www.mastbusiness.com/ | 4.23 | 4.34 | 2 | <a>2</a> | |
| www.competitionhunter.com/ | 4.16 | 4.21 | 2 | <a>6</a> | |0 -
How do I increase domain authority? Real Estate SIte
I have a site that is just a few months old. How do I get the domain authority up?
Moz Pro | | bronxpad0 -
I don't get what a dynamic URL is?
I have a whole bunch of them and I have no idea how I created them. I just make titles, that's it. Nothin' fancy.
Moz Pro | | annasus0