Moz Crawler suddenly reporting 1000s of duplicates (BE.net)
-
In the last 3-4 days we've had several thousand 'duplicate content' warnings appear in our crawl report, 99% of them related to our on-site blog. The blog is BlogEngine.Net, but the pages simply don't exist. The majority seem to be Roger trying quasi-random URLs like:
/?page=410/?page=151
Etc. etc. The blog will present content for these requests, but it is of course the same empty page since there's only unique content for up to /?Page=10 or so.
Two questions:
1. Did something change recently? These blogs have been up for months, and this problem has only come up this week. Did Roger change to become more aggressive lately?
2. Suggested remediation? On one of the blogs I've put no-index no-follow for any page that has a /?page querystring, and we'll see what effect that has come next crawl next week. However, I'm not sure this will work as per:
http://moz.com/community/q/functionality-of-seomoz-crawl-page-reports
Anyone else had dynamic blogs suddenly blossom into thousands of duplicate content warnings? Google (rightly) ignores these pages completely.
-
Hate to bump my own question, but it appears I spoke too soon about no-index,no-follow solving this. The duplicate errors went away for about 5 days, but then yesterday spiked with the same problem. I've confirmed that no-index, no-follow are present on the pages being detected as bad.
As per the best practices document:
http://moz.com/learn/seo/robotstxt
Using meta robots no index no follow is the recommended option:
Block with Meta NoIndex
This tells engines they can visit, but are not allowed to display the URL in results. This is the recommended method
But it apparently isn't working, as evidenced by the new surge of duplicate errors. Is there anything else I can do? I don't want to explicitly block Roger in robots.txt as that seems rather backward. Should Roger be included the Bad Robots List?
-
Peter -
Thanks for the clarification. I understand the philosophy at hand, and I kind of even understood it before I had asked the question. I'm handling these with a mix of canonical and no-index/no-robot.
Related to that, update:
By marking the superfluous pages no-index/no-follow the error count for the site has diminished by about 10,000 and the warning count by about 28,000 so that seems to be the way to go. The pages that had content are 'low value' in this context, since that content was readily available elsewhere.
-
Hi there!
Thanks for writing in with a great question.
We definitely count those dynamic URLs as duplicate content. While we are pretty sure that search engines can figure this stuff out and know which URL to index, it's still considered best practices to canonicalize or otherwise direct crawlers to the original URL (as far as I know. I'm not a professional SEO so you might be better off asking the Pro Q&A community at www.moz.com/community/q - they are all SEOs like you).
Since some dynamic URL generators can cause problems for crawlers, we do try to be overly-inclusive of these issues rather than overly-exclusive. We want people to know about potential issues with sites, even if they're not really issues in the scheme of the site owner's specific SEO implementation plan.
In sum, we'd rather leave those judgments up to you and at the same time, provide you with the data you need to make these decisions. I hope this helps explain our thinking here! However, if you think that our crawler might be having issues, and you do not want to post your site urls here you could always send us a support ticket at [email protected]. That way can can examine it a bit further and provide some insights into why our crawler thinks this way!
Hope this helps!
Peter
Moz Help Team.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's more accurate? GA queries data or Moz/SEMRush keyword data for rankings
What do you guys think? What's more accurate? GA queries data or Moz/SEMRush keyword data for rankings? Any thoughts appreciated.
Reporting & Analytics | | znotes0 -
Hour of the day that my analytics goals are being triggered within the all traffic report.
I am trying to identify the hour of the day that particular keywords (organic and PPC) are triggering my goals. Ideally I'd like to be able to use the all traffic report with the secondary dimension set as keyword. Hopefully I'm missing something simple, thanks all. Mark
Reporting & Analytics | | mde9110 -
Google Direct Traffic Reporting for Mobile on Analytics
We noticed a significant drop in our direct traffic in Google Analytics for mobile on July 30th but all our other traffic remains the same. What is the possible cause of this?
Reporting & Analytics | | COEDMediaGroup0 -
SEO Moz Errors
We have SEO Moz Errors and warnings showing up, yet we have cleaned them
Reporting & Analytics | | RNK
up. The same errors were showing up in Google's Webmaster tools but after we corrected them they do not show up as crawl errors in Webmaster tools.
Why is SEO Moz different and why does it continue to show corrections already made.0 -
What services/reports to try during my free trial period?
Hi all - I am just starting the 2nd month of a two month free trial. So far I have run advanced reports snooping at the dofollow backlinks of other bloggers in my niche to get an idea of where they're getting their backlinks from. I have also been looking at top pages of lots of blogs in my niche to get an idea of what the most popular content is. What else should I be trying out during my free trial period? Note, I'm not selling anything directly on the site. I'm looking to increase visitors, comments etc.
Reporting & Analytics | | KateV0 -
What does "on first page" mean in seomoz ranking reports?
Hi - When reports here show numbers of keywords appearing "on first page", there must be some implicit assumption made about the number of results listed per page. 1. Can anyone tell me what that assumption is? Is it 10? 20? 2. What about universal results Local links? If the answer to number one is, for instance, 20 results per page, then are there any assumptions made about the number of universal results Local links included? I'm just trying to understand what the reports mean. Thanks, Tim
Reporting & Analytics | | tcolling0 -
Google Analytics Report throws up Google as a referrer
Good morning from Wet & Windy 12 degrees C wetherby UK... Using Google analytics I've noticed in the traffic sources refferer subsection some traffic is categorized as originating from Google. Whats puzzling me is.... I know a huge amount of traddic stems from Google but as the below screenshot illustrates only 21 visitors come from Google: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/google-refferal-sources-top-levelcopy.jpg And when i drill down some are coming from Google mobile 😞 http://i216.photobucket.com/albums/cc53/zymurgy_bucket/google-referral-sourcescopy.jpg Is traffic categorised as Google referrer down to Google hiding searches via ssl as explained here: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/google-referral-sourcescopy.jpg Any insights welcome 🙂
Reporting & Analytics | | Nightwing0 -
Duplicate Content From My Own Site?!
When I ran the SEO Moz report it says that I have a ton of duplicate content. The first one I looked at was my home page. http://www.kisswedding.com/ http://www.kisswedding.com/index.html http://kisswedding.com/index.html All of the above 3 have varying internal links, page authority, and link root domains. Only the first has any external links. All of the others only seem to have 1 other duplicate page. It's a difference between the www and the non-www version. I have a verified acct for www.kisswedding.com in google webmaster tools. The non-www version is in there too but has not been verified. Under settings for the verified account (www.kisswedding.com), "Don't set a preferred domain" is checked off. Is that my mistake. And if so, which should I select? The www version or the non-www version? Thanks!
Reporting & Analytics | | annasus0