Moz Crawl shows over 100 times more pages than my site has?
-
The latest crawl stats are attached. My site has just over 300 pages?
Wondering what I have done wrong?
-
total pages is higher you are right Keri but still only 581
-
I believe this image looks at what's indexed that's a subset of your sitemap that you submitted. You may want to look at Google Index -> Index Status in GWT to see what it shows there.
-
latest Moz crawl
-
latest webmaster tools crawl
-
I will definetly be paying attention to those numbers Keri. Webmaster tools is showing the right number of pages (something over 300 with 90% of those indexed)
-
It's not going to be a penalty, but it'll be good to have a bit less of a load on your server (bots no longer crawling thousands of pages) and just have your real pages in the index.
Places to look for interesting changes in site metrics would be your organic traffic in analytics and taking a look at your Google Webmaster Tools account to see your impressions, pages crawled, etc.
-
Thanks Keri, I will update asap.
could you let me know how big an issue would this be? (When you have the time of course;))
-
You're welcome! I may have opened a can of worms, however. That sitemap is generated by an automated tool (based on the footer at the bottom), so somehow it's finding that page 28 as well.
You may also want to ask the developer if you should be indexing the categories in the blog archives. There are resources on Moz about the best way to set that up in Wordpress, but I don't have them at my fingertips at the moment (I have a snuggly baby sleeping on my lap instead that's slowing me down a tad).
To answer your next question, after you figure out where the page 28 is being linked from and cure that, yes, you can do a one-time crawl from Research Tools. It won't overwrite your campaign info, but you can at least see if Moz is seeing thousands of pages or just a few hundred to see if stuff was fixed. Again, happy to provide more detail if/when you need it (and others will likely jump in with help on the thread, too).
I'd love to also see a little update a few weeks down the line of any changes you've noticed on your site metrics after getting this fixed.
-
You rock:)
-
And I found it. The sitemap at http://www.nineclouds.ca/sitemap includes a page /28, which is where the crawlers are finding the non-existent pages.
-
If you look at http://www.nineclouds.ca/blog/page/23, you'll see that there's a double arrow in the pagination at the right that goes to page 24, even though the last page is page 21. Google somehow has found the pages greater than 21 (which I'm not sure how they found), and once they found one of those, they keep seeing the link there with the double arrows to go to another page. Same happened with Rogerbot. I'm not sure where the bad originating link is (what legit page on your site is linking to something over page 21), but that's the loop that's happening and causing a ton of pages to be indexed. Get rid of those, and you'll also get rid of most of your errors.
-
Not shy about that at all thanks Keri.
any help you can provide is greatly appreciated.
-
Hi Bill,
Using my admin powers, I took a peek at your account. I'm still trying to figure out where it's coming from, but you have thousands of empty pages of your blog indexed. I'll dig around a little more and see if I can figure out what's up.
If you're comfortable with sharing your URL here in a public forum, other people can come take a look too. Otherwise, I'm happy to send you a private message with part of what's up and give your developer a place to start looking.
-
Thanks Keri. I am the owner of the site not the programmer so I am looking up the terms you are using as I write this response. If I am using pagination is there a way for the moz not to allow for this? If I understand your question about the calendar correctly I do have one as part of my blog that dates each post? Can I get the bot to not recognize this calendar?
-
My first guess would be parameters or something are being crawled. Do you have pagination? Sorting ascending and descending? A calendar that's getting crawled through the year 2525?
Your next step would be to look into what those duplicate pages are and see if something is amiss that's generating a ton of URLs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Analytics Set-Up for site with both http & https pages
We have a client that migrated to https last September. The site uses canonicals pointing to the https version. The client IT team is reluctant to put 301 redirects from the non-secure to the secure and we are not sure why they object. We ran a screaming frog report and it is showing both URLs for the same page (http and https). The non-secure version has a canonical pointing to the secure version. For every secure page there is a non-secure version in ScreamingFrog so Google must be ignoring the canonical and still indexing the page however, when we run a site: we see that most URLs are the secure version. At that time we did not change the Google Analytics setup option to use: "https" instead of "http" BUT GA appears to be recording data correctly. Yesterday we set up a new profile and selected "https" but our question is: Does the GAnalytics http/https version make a difference if so, what difference is it?
Reporting & Analytics | | RosemaryB1 -
High Temporary Redirects: Login required pages
Noticed something interesting, a high temporary redirect report from Moz. Reviewing the pages they are caused by the user having to login and getting redirected. I can see the returnto query in the URL too. My thoughts: Since a login is required and the user is being redirected, these should remain 302 and not 301. I tested my Google Analytics account to **Exclude URL Query Parameter **returnto, just to see if it affected traffic. It didn't, I mean I don't see urls duplicated with the parameter anymore, just grouped together, so traffic is still being counted. I'm going to wait 1 more day and see what impact the GA traffic is before applying the exclusion to my true Google Analytics profile. This got me thinking, I should probably exclude this parameter from Google and Bing Webmaster Tools, that way Google/bing won't read those urls. Now does Moz's crawler follow that? Do you think that would change my moz crawl diagnostic report because I told Google/Bing crawlers to exclude that parameter. What do you think of my approach to reduce these high temporary redirects reported by Moz? Will it work? Has it plagued you?
Reporting & Analytics | | Bio-RadAbs0 -
What about this (google crawl)?
Recently we did a serious effort on SEO with SEO Yoast (Wordpress). And after a few months of tweaking old articles we get this impact on crawl search.. Is this graph normal? s1TQgv9.png
Reporting & Analytics | | noodweerbenelux0 -
Webmaster Tools Error: Unreachable page
Hi all, When I try to the "Fetch as Google" feature on Webmaster Tools, I get the error Unreachable page. I checked the Google Analytics code, everything seems to be OK. What should I do?
Reporting & Analytics | | fisniks0 -
Major practices which helps to index pages by google.
Actually, We have submitted more than 100 pages in to google through xml sitemap. But, we see in that 75% of the pages where indexed by google. Note : Excluding the duplicate pages
Reporting & Analytics | | Webworld_Norway0 -
Google Search Bar Vs Address Bar To Determine Number Of Times the Domain Name Is Typed In..
Hello, I'm trying to get a rough estimate of how many times a domain name that we're interested in acquiring is typed in to the address bar. If the google keyword tool says for instance, that the exact match domain name is typed in 720 times, how many times it typed in to the address bar? example.com - 720 global searches Thanks!
Reporting & Analytics | | Optimize0 -
How serious are the Duplicate page content and Tags error?
I have a travel booking website which reserves flights, cars, hotels, vacation packages and Cruises. I encounter a huge number of Duplicate Page Title and Content error. This is expected because of the nature of my website. Say if you look for flights between Washington DC and London Heathrow you will at least get 60 different options with same content and title tags. How can I go about reducing the harm if any of duplicate content and meta tags on my website? Knowing that invariably I will have multiple pages with same content and tags? Would appreciate your advice? S.H
Reporting & Analytics | | sherohass0 -
For an optimized site, any available stats / guesstimates on what is avg % of traffic to homepage vs. second-level pages?
I'm interested in passing this info on to a client who experienced a period of time when an incorrect GA code was installed on their homepage. They were able to get Google stats on second level pages only. This is a site that gets 80 + % of visits from organic search engine referrals. They do minimal advertising. Thanks in advance.
Reporting & Analytics | | alankoen1230