Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content on site->citations? How important is it to change it?
Hi there, I recently realised that the citations and directories i was building used the same content than the one on my website. I know this is not best practice. I will for sure make sure it doesn't happen in the future, but I am affraid of the ones i built in the past. How much do you think this would affect my rankings, and do you think this is a priority to go through my citations and directories to modify it?
Link Building | | H.M.N.0 -
'spammy' domains redirecting to website
Hi Everyone, I hope that someone will be able to help us with this one as we have trawled the internet looking for a solution! We have multiple domains (.com/.co.uk/.net versions) which all point to the one website, however, some of these domains have a high spam score - 9-11. Our first initial reaction would be to remove the auto redirects, but, the other domains have been a source of conversion in previous months (or so analytics tells us). So what I'm wondering, is do we remove the 'spammy' links from redirecting to our site, or do we leave them there? We certainly don't want to risk a penalty. Thanks for reading!
Link Building | | hydra_creative0 -
Very old site wanting a revamp - how to handle all the redirects/URL changes
Hi all, I have a very very old site, that requires a facelift (going from an old clunky CMS to wordpress). A problem I know I am going to face is that this site has a very poor URL structure (www.site.com/section/1/section/ etc) which I plan on cleaning up. but how do I handle the 100+ backlinks this website has point to such links? If someone could point me in the direction of an article or such that would be great. Thanks,
Link Building | | Greenroads0 -
Should you ever change your anchor text ?
Hello I have a question about anchors. I have done all my own seo over the last 3 years, with tools from various sites. I had an seo audit done about 1 month ago and was told my link profile was very natural. They had one recommendation. To go back over my link profile and ask some webmasters if they would change the anchor from the name of my site or my url to a more seo friendly phrase. This seemed logical. I never did a lot of anchor text just name or site or url. Anyway, over last 4 weeks I have messaged several webmasters and asked to have anchor text changed to something along the lines of the keywords Im targeting. Tedious task to go through all the links but I changed several anchors to what was recommended. I was also out link building at same time. These last links and I got several of them all natural links after 16 hours work days. Are all will seo friendly anchors, because as Ive gotten more experienced my links have gotten more in lines of what is "seo friendly" or at least I hoped. I asked one webmaster to change my anchor and he warned me I would be slapped with a penguin penalty and wouldn't recommend I do this. I have already done this to several of my links. Then today the new seomoz update came up and I was down on DA and PA by 3-4 points. Do these have anything to do with one another and have I been given bad advice and can I fix it if I have ? Sorry about long post just a little confused. I don't want to step into penalty land and not know I did.
Link Building | | New1000ad0 -
A link with "return false"- OSE sees as a No Followed Link
Hello, I couldn't find a clear answer to the impact on SEO for a link written in this way: [" class="expert_info" onclick="window.open(this.href);return false;">](w</span>ww.yourwebsite.com<span style=) [Does the "return false" act as a "no follow"? I came across this in our link data in Open Site Explorer which lists these links all as "no follows." However, an engineer I spoke to said that it shouldn't impact search engine behavior. Any ideas? Thank you in advance! -Sarah K.](w</span>ww.yourwebsite.com<span style=)
Link Building | | OneMedical0 -
What's a really good example of a linkbait-y Category / Subcategory hierarchy?
Rand makes a really great point in this 2009 post about the shape of crawl paths: "#4. Craft navigation / category pages that are worthy of links. If you can make these pages worthy of links and attention, you drive PageRank and crawl priority further down your site's architecture into the content (and signal the engine that ALL your pages are important." Which makes sense, intuitively, because you'd like link juice to flow directly and undiluted to your money pages. "Here's all my Green Widgets, Roger: they're all right here. While you're at it, here's a related blog post—'5 Ridiculously Awesome Things Every Green Widget Buyer Should Know'—and, oh look! Would you like to see my Blue Widgets as well?" In practice, though, the Home » Widgets » Green Widgets doesn't sound all that alluring. Useful, absolutely, for UX, but not for getting links. Anyone have some favorite examples of Category / Subcategory hierarchies that do well as link-bait? Client is a marketing agency dealing in the technical arcana of databases and ad serving, so their money pages won't be as specific as a Green Widget or a Miami Hotel. Their site isn't huge, and the pages will be extensively interlinked, so the emphasis has more to do with link juice / page authority than indexation. But I'm wondering if it could it be smart to replace a generic "Services" category with a KW-rich drop-down menu of "Marketing Solutions" (i.e. 'Increase Customer Retention') and link each landing page to a relevant charcuterie of services, white papers, webinars, case studies, etc., rather than keeping these pages in their respective silos—even as they link horizontally to related services?
Link Building | | sweetfancymoses2 -
Link Building & Moz metrices
I am just starting up so no flaming please. Okay so i was given a project and i was amused to see a lot of backlinks to the website with viagra pills title. This is why i assume the domain authority is very low. I tried asking a few website owners to drop the links but they said its a lot of manual work they wont do it(that was shocking) So what next? Would building authority links in any domains help me in this case?
Link Building | | hardik_hrc0 -
JavaScript is crawled by search engines, isn’t it? Does it mean that links embedded in JavaScript pass link juice?
I wonder If links embedded in JavaScript from an external Website pass link juice to the linked page and thus have a positive effect on google rankings. I read that JavaScipt is craweld. Does it mean that also the link juice is passed? I'm looking forward to your answers.
Link Building | | Tabea0