Issue in number of pages crawled
-
i wanted to figure out how our friend Roger Bot works.
On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.
Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.
I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.
Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.
Thanks!
-
Hey Chirag
That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.
If a single page resolves on multiple versions of the URL like...
/pagename
/pagename/
/pagename.html
Then one single page could get reported as three pieces of content.
So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.
Hope that helps.
Marcus -
Hi Marcus,
Thanks for the reply.
Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.
the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.
Cheers
-
Hey Chirag
As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filter Pages
Howdy Moz Forum!! I have a headache of a job over here in the UK and I'd welcome any advice! - It's sunny today, only 1 of 5 days in a year and i'm stuck on this! I have a client that currently has 22,000 pages indexed to Google with almost 4000 showing as duplicate content. The site has a "jobs" and "candidates" list. This can cause all sorts of variations such as job title, language, location etc. The filter pages all seem to be indexed. Plus the static pages are indexed. For example if there were 100 jobs at Moz being advertised, it is displaying the jobs on the following URL structure - /moz
Moz Pro | | Slumberjac
/moz/moz-jobs
/moz/moz-jobs/page/2
/moz/moz-jobs/page/3
/moz/moz-jobs/page/4
/moz/moz-jobs/page/5 ETC ETC Imagine this with some going up to page/250 I have checked GA data and can see that although there are tons of pages indexed this way, non of them past the "/moz/moz-jobs" URL get any sort of organic traffic. So, my first question! - Should I use rel-canonical tags on all the /page/2 & /page/3 etc results and point them all at the /moz/moz-jobs parent page?? The reason for this is these pages have the same title and content and fall very close to "duplicate" content even though it does pull in different jobs... I hope i'm making sense? There is also a lot of pages indexed in a way such as- https://www.examplesite.co.uk/moz-jobs/search/page/9/?candidate_search_type=seo-consulant&candidate_search_language=blank-language These are filter pages... and as far as I'm concerned shouldn't really be indexed? Second question! - Should I "no follow" everything after /page in this instance? To keep things tidy? I don't want all the variations indexed! Any help or general thoughts would be much appreciated! Thanks.0 -
Is The Number of Duplicate Pages reduced after adding canonical ref to the dupe versions ?
Hi Is the number of duplicate pages reported in a dupe page content error report reduced on subsequent crawls, if you have resolved the dupe content problem via adding the canonical tag to duplicate versions (referring the original page). Like it would if you were solving the problem via a 301 redirect (i think/presume) ? Cheers Dan
Moz Pro | | Dan-Lawrence0 -
Is www.domain.com/page the same url as www.domain.com/page/ for Google? (extra slash at end of url)
Dear all, in open site explorer there is a difference the url's 'www.domain.com/page' and 'www.domain.com/page/' (extra slash at end). There can be different values in pageauthority etc. in the open site explorer tool, but is this also the case for Google? Thanks for replying, Regards, Ben
Moz Pro | | HMK-NL0 -
Crawl Diagnostics 403 on home page...
In the crawl diagnostics it says oursite.com/ has a 403. doesn't say what's causing it but mentions no robots.txt. There is a robots.txt and I see no problems. How can I find out more information about this error?
Moz Pro | | martJ0 -
Maximum number of daily MozPoints?
Since there are no stupid questions, here's mine. I have been answering a lot of questions for people today, but after a certain point, my MozPoints stopped increasing. Is there a daily cap? I'm providing valuable responses, not just saying anything to get more points.
Moz Pro | | UnderRugSwept0 -
How to crawl the whole domain?
Hi, I have a website an e-commerce website with more than 4.600 products. I expect that Seomoz scan check all url's. I don't know why this doesn't happens. The Campaign name is Artigos para festa and should scan the whole domain festaexpress.com. But it crels only 100 pages I even tried to create a new campaign named Festa Express - Root Domain to check if it scans but had the same problem it crawled only 199 pages. Hope to have a solution. Thanks,
Moz Pro | | EduardoCoen
Eduardo0 -
Duplicate page content reports duplicates, but pages don't show duplication
My duplicate page reports shows 376 pages with duplicate content. After reviewing the pages the report claims have duplicate content, i can't find duplications. could this be an error, or is there some source code that doesn't display that could be causing this issue?
Moz Pro | | noonzie0 -
Home page not indexed by Google
Hello, Teacherprose.com 1. Sitemap was successfully submitted via Google webmaster tools 2. Site has been up for two years. 3. Site shows up in Google results for "Teacher Resume Service" 4. According to Google and SEOMoz, home page not indexed by Google or Bing. I'm a novice, am I missing something obvious? Thank You, Eric
Moz Pro | | monthelie10