Issue in number of pages crawled
-
i wanted to figure out how our friend Roger Bot works.
On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.
Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.
I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.
Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.
Thanks!
-
Hey Chirag
That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.
If a single page resolves on multiple versions of the URL like...
/pagename
/pagename/
/pagename.html
Then one single page could get reported as three pieces of content.
So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.
Hope that helps.
Marcus -
Hi Marcus,
Thanks for the reply.
Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.
the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.
Cheers
-
Hey Chirag
As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Diagnostics 2261 Issues with Our Blog
I just recently signed up for MOZ, so much information. I've done the walk through and will continue learning how to us the tools. But I need your help. Our first moz crawl indicated 2261 issues (447 404's, 803 duplicate content, 11 502's, etc). I've reviewed all of the crawls issues and they are linked to our Yahoo hosted WordPress blog. Our blog is over 9 years old. The only issue that I'm able to find is our categories are not set up correctly. I've searched for WordPress assistance on this topic and cant find any issues with our current category set up. Every category link that I click returns Nothing Found Apologies, but no results were found for the requested archive. Perhaps searching will help find a related post. http://site.labellaflorachildrensboutique.com/blog/ Any assistance is greatly appreciated.
Moz Pro | | Girlstuff0 -
Duplicate Page Content on pages that appear to be different?
Hi Everyone! My name's Ross, and I work at CHARGED.fm. I worked with Luke, who has asked quite a few questions here, but he has since moved on to a new adventure. So I am trying to step into his role. I am very much a beginner in SEO, so I'm trying to learn a lot of this on the fly, and bear with me if this is something simple. In our latest MOZ Crawl, over 28K high priority issues were detected, and they are all Duplicate Page Content issues. However, when looking at the issues laid out, the examples that it gives for "Duplicate URLs" under each individual issue appear to be completely different pages. They have different page titles, different descriptions, etc. Here's an example. For "LPGA Tickets", it is giving 19 Duplicate URLs. Here are a couple it lists when you expand those:
Moz Pro | | keL.A.xT.o
http://www.charged.fm/one-thousand-one-nights-tickets
http://www.charged.fm/trash-inferno-tickets
http://www.charged.fm/mylan-wtt-smash-hits-tickets
http://www.charged.fm/mickey-thomas-tickets Internally, one reason we thought this might be happening is that even though the pages themselves are different, the structure is completely similar, especially if there are no events listed or if there isn't any content in the News/About sections. We are going to try and noindex pages that don't have events/new content on them as a temporary fix, but is there possibly a different underlying issue somewhere that would cause all of these duplicate page content issues to begin appearing? Any help would be greatly appreciated!0 -
Unable to view crawl test
After doing a crawl test i get a download report. It then downloads in csv form and when I go to view it there is a curruption error or just a load of gibberish signs Can I not see the report onsite?
Moz Pro | | hantaah0 -
Crawl Diagnostics
My site was crawled last night and found 10,000 errors due to a Robot.txt change implemented last week in between Moz crawls. This is obviously very bad so we have corrected it this morning. We do not want to wait until next Monday (6 days) to see if the fix has worked. How do we force a Moz crawl now? Thanks
Moz Pro | | Studio330 -
Duplicate page title
Hello my page has this Although with seomoz crawl it says that this pages has duplicate titles. If my blog has 25 pages, i have according seomoz 25 duplicate titles. Can someone tell me if this is correct or if the seomoz crawl cannot recognize rel="next" or if there is another better way to tell google when there a pages generated from the blog that as the same title Should i ignore these seomoz errors thank you,
Moz Pro | | maestrosonrisas0 -
I've got quite a few "Duplicate Page Title" Errors in my Crawl Diagnostics for my Wordpress Blog
Title says it all, is this an issue? The pages seem to be set up properly with Rel=Canonical so should i just ignore the duplicate page title erros in my Crawl Diagnostics dashboard? Thanks
Moz Pro | | SheffieldMarketing0 -
Sorting Dupe Content Pages
Hi, I'm no excel pro, and I'm having a bit of a challenge interpreting the Crawl Diagnostics export .csv file. I'd like to see at a glance which of my pages (and I have many) are the worst offenders for dupe content – ie. which have the most "Other URLs" associated with them. Thanks, would appreciate any advice on how other people are using this data, and/or how 'Moz recommends to do it. 🙂
Moz Pro | | ntcma0 -
SEOmoz crawl error questions
I just got my first seomoz crawl report and was shocked at all the errors it generated. I looked into it and saw 7200 crawl errors. Most of them are duplicate page titles and duplicate page content. I clicked into the report and found that 97% of the errors were going off of one page It has ttp://legendzelda.net/forums/index.php/members/page__sort_key__joined__sort_order__asc__max_results__20 http://legendzelda.net/forums/index.php/members/page__sort_key__joined__sort_order__asc__max_results__20__quickjump__A__name_box__begins__name__A__quickjump__E etc Has 20 pages of slight variations of this link. It is all my members list or a search of my members list so it is not really duplicate content or anything. How can I get these errors to go away and make search my site is not taking a hit? The forum software I use is IPB.
Moz Pro | | NoahGlaser780