Moz Crawl shows over 100 times more pages than my site has?
-
The latest crawl stats are attached. My site has just over 300 pages?
Wondering what I have done wrong?
-
total pages is higher you are right Keri but still only 581
-
I believe this image looks at what's indexed that's a subset of your sitemap that you submitted. You may want to look at Google Index -> Index Status in GWT to see what it shows there.
-
latest Moz crawl
-
latest webmaster tools crawl
-
I will definetly be paying attention to those numbers Keri. Webmaster tools is showing the right number of pages (something over 300 with 90% of those indexed)
-
It's not going to be a penalty, but it'll be good to have a bit less of a load on your server (bots no longer crawling thousands of pages) and just have your real pages in the index.
Places to look for interesting changes in site metrics would be your organic traffic in analytics and taking a look at your Google Webmaster Tools account to see your impressions, pages crawled, etc.
-
Thanks Keri, I will update asap.
could you let me know how big an issue would this be? (When you have the time of course;))
-
You're welcome! I may have opened a can of worms, however. That sitemap is generated by an automated tool (based on the footer at the bottom), so somehow it's finding that page 28 as well.
You may also want to ask the developer if you should be indexing the categories in the blog archives. There are resources on Moz about the best way to set that up in Wordpress, but I don't have them at my fingertips at the moment (I have a snuggly baby sleeping on my lap instead that's slowing me down a tad).
To answer your next question, after you figure out where the page 28 is being linked from and cure that, yes, you can do a one-time crawl from Research Tools. It won't overwrite your campaign info, but you can at least see if Moz is seeing thousands of pages or just a few hundred to see if stuff was fixed. Again, happy to provide more detail if/when you need it (and others will likely jump in with help on the thread, too).
I'd love to also see a little update a few weeks down the line of any changes you've noticed on your site metrics after getting this fixed.
-
You rock:)
-
And I found it. The sitemap at http://www.nineclouds.ca/sitemap includes a page /28, which is where the crawlers are finding the non-existent pages.
-
If you look at http://www.nineclouds.ca/blog/page/23, you'll see that there's a double arrow in the pagination at the right that goes to page 24, even though the last page is page 21. Google somehow has found the pages greater than 21 (which I'm not sure how they found), and once they found one of those, they keep seeing the link there with the double arrows to go to another page. Same happened with Rogerbot. I'm not sure where the bad originating link is (what legit page on your site is linking to something over page 21), but that's the loop that's happening and causing a ton of pages to be indexed. Get rid of those, and you'll also get rid of most of your errors.
-
Not shy about that at all thanks Keri.
any help you can provide is greatly appreciated.
-
Hi Bill,
Using my admin powers, I took a peek at your account. I'm still trying to figure out where it's coming from, but you have thousands of empty pages of your blog indexed. I'll dig around a little more and see if I can figure out what's up.
If you're comfortable with sharing your URL here in a public forum, other people can come take a look too. Otherwise, I'm happy to send you a private message with part of what's up and give your developer a place to start looking.
-
Thanks Keri. I am the owner of the site not the programmer so I am looking up the terms you are using as I write this response. If I am using pagination is there a way for the moz not to allow for this? If I understand your question about the calendar correctly I do have one as part of my blog that dates each post? Can I get the bot to not recognize this calendar?
-
My first guess would be parameters or something are being crawled. Do you have pagination? Sorting ascending and descending? A calendar that's getting crawled through the year 2525?
Your next step would be to look into what those duplicate pages are and see if something is amiss that's generating a ton of URLs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
UA Codes on Multiple Sites
Someone came in and set up Google Analytics code on one of our clients' sites before we got there and it is showing data from an outside website that the code is also on. Does anyone know the best way to find out what that other site is so we can remove this code from both sites? Thanks as always MOZ friends!
Reporting & Analytics | | ReunionMarketing0 -
On Google Analytics, Pages that were 301 redirected are still being crawled. What's the issue here?
URL that we redirected are being crawled on Google Analytics. Since they dont exist, they have high bounce rates. What can the issue be?
Reporting & Analytics | | prestigeluxuryrentals.com0 -
If i was to drastically improve 5 critical things on my site, what would you suggest?
I have put in a lot of improvements on my site both onsite and offsite, I was just wondering from a critical point of view, what 5 things would you suggest would require an improvement, that will consequently lead to both, a better user experience and better Rankings on Google? Open even to criticism 🙂 Thank You..... Find my site here:http://bit.ly/1vW4GGP
Reporting & Analytics | | ConnectMedia0 -
Webmaster Tools Indexed pages vs. Sitemap?
Looking at Google Webmaster Tools and I'm noticing a few things, most sites I look at the number of indexed pages in the sitemaps report is usually less than 100% (i.e. something like 122 indexed out of 134 submitted or something) and the number of indexed pages in the indexed status report is usually higher. So for example, one site says over 1000 pages indexed in the indexed status report but the sitemap says something like 122 indexed. My question: Is the sitemap report always a subset of the URLs submitted in the sitemap? Will the number of pages indexed there always be lower than or equal to the URLs referenced in the sitemap? Also, if there is a big disparity between the sitemap submitted URLs and the indexed URLs (like 10x) is that concerning to anyone else?
Reporting & Analytics | | IrvCo_Interactive1 -
Multiple-Domain tracking for sister sites- NO retail checkout- Please help
Hello, I have about 5 sites I want to set up multiple-domain tracking in google analytics. All posts I read seem to be focused on cross-domain tracking for the purpose of tracking a visitor from one domain across another domain for shopping cart check outs. I don't need that. I have about 3 sister sites (mastersite.com, sistersite1.com, sistersite2.com, sistersite3.com) related to my primary site. I want 1 Master Analytics Profile to track traffic for all of these sites combined. My visitors will not jump from mastersite.com over to sistersite1.com. There will be no cross-domain visits. How can I set up 1 master google analytics profile that will aggregate traffic data from all sites and present the data to me in one analytics profile. Please help
Reporting & Analytics | | AndreGant0 -
Google Analytics is not showing me eCommerce Section Details for My Online eCommerce Website!
From Last few days we are not able to find any details from eCommerce section at Good Analytics account for our eCommerce website i.e order detail, Price e.t.c. Please any one can suggest proper solution for this query?
Reporting & Analytics | | yuvastyle0 -
Calculating page visit duration for bounced visits?
IS there any way on Google Analytics to calculate page visit duration for bounced visits? if so, what would need to be done?
Reporting & Analytics | | offthemaptravels0 -
Why did I loose all my product page rankings (e-commerce site)
This friday I noticed that I'd lost pretty much all my product pages in the SERP and also their rankings for the product names. These are products I both have introduced to the market (sweden) and also some that I've been the only one selling. I've analyzed a couple of different ranking-faults. Examples: **"super mario väggdekaler" should rank **http://www.roligaprylar.se/Super-Mario-Vaeggdekaler.html as #1 and has done for several years. Instead this search in my internal search engine ranks #10-#15 with no relevance. www.roligaprylar.se/?q=mario%20v%E4g "jedi morgonrock" should rank www.roligaprylar.se/Jedi-Morgonrock.html as #1 or #2 but instead this url ranks as #12 www.roligaprylar.se/product_detail.php?pid=Jedi-Morgonrock "Charlie sheen bobblehead" (in the swedish serp this should be the most simple term to rank on. previously #1) my internal search engine ranks for #8 with this url <cite>www.roligaprylar.se/?q=Charlie%20Sheen%20Bobblehead</cite>J So I've drawn these conclusions and actions Products that don't rank well longer but still ranks with their alternative non-rewritten url has gotten deep links from affilliates (i track affilliate ids and stuff via this link) and have replaced the original url which is rewritten. Action: Canonical urls for these non-rewritten products to the rewritten version. For example on this product page www.roligaprylar.se/product_detail.php?pid=Jedi-Morgonrock I've placed a canonical for this url www.roligaprylar.se/Jedi-morgonrock.html With the products not ranking at all or when searches in my search engine shows up I suspect some kind of dup content punishment where Google thinks the search result is more important than the product page. Action: All search-pages are now noindex,follow I also increased product name density in terms of keywords on the product page. But I'm still owned and losing tons of money during the holidays (buying adwords at obscene amounts instead hehe). So just wanted to hear with you guys. Are my conclusions and actions correct? What have I missed, what more could I do to reverse this? Thanks Dan
Reporting & Analytics | | nuttinalle0