Indexed pages
-
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers...
- Google Search Console: 237 indexed pages
- Google search using site command: 468 results
- MOZ site crawl: 1013 unique URLs
- Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500)
Can anyone shed any light on why they differ so much? And where lies the truth?
-
Another option is if the site uses a CMS. If so, then you can create a sitemap for content pages/posts etc,.
Personally, I'm with Krzysztof Furtak on SF. Screaming Frog rocks. It'll find most pages, except perhaps Orphan pages as it wouldn't be able to find a link to crawl to discover the page.
If it's really important to get as many pages as possible, I'd do the following (I've put an Astrix (*) next to ones that some people may think are a tad extreme)
- Run a Screaming Frog crawl
- Grab a sitemap from your CMS
- Check any server-based analytics (AWSTATS etc)
- Check your access_log file & parse out URLs in there**(*)**
- site: queries, with & without www, and also using * as a subdomain (use something like Moz's toolbar to export)
- As Krzysztof suggests, Scrapebox would extract data too, but be careful scraping, you may get an IP slap.(*)
- Export crawl data from Moz & a tool such as Deep Crawl
- Throw the pages from all into Excel and de-dupe.
- Once you have a de-duped list, as an optional last step, go back to Screaming Frog and enter list mode (I have the paid version, not sure if it's possible with the free one) and run a crawl over all the de-duped URLs to get status codes etc
If you're going to do this sort of thing a fair bit - buy a Screaming Frog license, it's an awesome tool and can be useful in a multitude of situations.
-
The site: command is handy for asking Google what pages it knows about, however if Muzzmoz wants to know the number of pages on a site, you'd need more than this.
Also, re: your different ways or querying, I like to use:
site:*.domain.com - This can show other subdomains too, that may otherwise be missed
-
Ok so check with site something under 1000 pages and go to the last results page. You'll see that there'll be different number (in almost all cases).
-
I Will Always Prefer To Check Manually Using Site Command Because, site: operator, which will show us how many pages Google currently has indexed for the domain.
There Will Be Difference Between Index status in search console and current index as search console update the data after few days.
The number of indexed URLs is almost always significantly smaller than the number of crawled URLs, because Total indexed excludes URLs identified as duplicates, non-canonical or those that contain a meta no index tag.
Also, Check For Index(Preferred) Version Of Your Site
For E.g-
You can check More About this Here - https://support.google.com/webmasters/answer/2642366?hl=en
-
Hi
Most accurate number is from screaming frog (if you have less than 500 pages or paid version if more than 500).
Google indexes what it wants and if good enough to show in google index. If some pages are similar, got quality issues, blocked by robots etc then it won't show all. BTW don't think number in GSC or google index is good, check it manually because there can be 468 but in fact 200 only.
Moz can have "historical" pages that now don't exists or don't care about quality issues.
The truth is in screaming frog - most accurate number. If you used google user agent then number is the max that can appear in google index. If screaming frog user agent with turned off robots then you'll see bigger number (but google won't show it because of blocks).
If you want to check what's indexed then use tool like scrapebox. First get all urls (maybe without images if you don't care), then check indexed with sb. What's not indexed, can have some issues.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Does anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
Technical SEO | | Chophel
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
Large Drop in Indexed Pages But Increase in Traffic
We run a directory site and noticed about a week ago that Google Webmaster Tools was reporting a huge drop in indexed pages (from around 150,000 down to 30,000). In the same time, however, our traffic has increased. Has anyone seen this before or have any ideas on why this could happen? I have search for technical errors but nothing has changed on our site or our content.
Technical SEO | | sa_787040 -
Issues Indexing Translated Pages
I'm having trouble getting http://www.procloud.ch/ to index for their german pages. The english pages are being indexed but not the german. Any ideas? Chris
Technical SEO | | ninel_P0 -
Sudden Drop in Indexed Pages and Images under Sitemap
Hello! Just a couple days back, realised that under the Google Webmaster Tool > Sitemap, my website www.bibliotek.co has a sudden drop in indexed pages and images. Previously, it was almost fully indexed. However, I checked and the Google Index > Index Status, it is still fully indexed Any reason why and how do I resolve? Any help is very much appreciated! Thanks in advance!
Technical SEO | | Bibliotek1230 -
My website's pages are not being indexed correctly
Hi, One of our websites, which is actually a price comparison engine, facing indexing problem at Google. When we check “site:mywebsite.com “, there are lots of pages indexed which are not from mywebsite.com but from merchants websites. The index result page also shows merchant’s page title. In some cases the title is from merchant’s site but when the given link is accessed it points to mywebsite.com/index. Also the cache displays the merchant’s product page as the last indexed version rather than showing ours. The mywebsite.com has quite few Merchants that send us their product feed. Those products are listed on comparison page with prices. The merchant’s links on comparison page are all no-follow links but some of the (not all) merchant’s product pages are indexed against mywebsite.com as mentioned above instead of product comparison page of mywebsite.com How can we fix the issue? Thanks!
Technical SEO | | digitalMSB0 -
"One Page With Two Links To Same Page; We Counted The First Link" Is this true?
I read this to day http://searchengineland.com/googles-matt-cutts-one-page-two-links-page-counted-first-link-192718 I thought to myself, yep, thats what I been reading in Moz for years ( pitty Matt could not confirm that still the case for 2014) But reading though the comments Michael Martinez of http://www.seo-theory.com/ pointed out that Mat says "...the last time I checked, was 2009, and back then -- uh, we might, for example, only have selected one of the links from a given page."
Technical SEO | | PaddyDisplays
Which would imply that is does not not mean it always the first link. Michael goes on to say "Back in 2008 when Rand WRONGLY claimed that Google was only counting the first link (I shared results of a test where it passed anchor text from TWO links on the same page)" then goes on to say " In practice the search engine sometimes skipped over links and took anchor text from a second or third link down the page." For me this is significant. I know people that have had "SEO experts" recommend that they should have a blog attached to there e-commence site and post blog posts (with no real interest for readers) with anchor text links to you landing pages. I thought that posting blog post just for anchor text link was a waste of time if you are already linking to the landing page with in a main navigation as google would see that link first. But if Michael is correct then these type of blog posts anchor text link blog posts would have value But who is' right Rand or Michael?0 -
Pages not being indexed
Hi Moz community! We have a client for whom some of their pages are not ranking at all, although they do seem to be indexed by Google. They are in the real estate sector and this is an example of one: http://www.myhome.ie/residential/brochure/102-iveagh-gardens-crumlin-dublin-12/2289087 In the example above if you search for "102 iveagh gardens crumlin" on Google then they do not rank for that exact URL above - it's a similar one. And this page has been live for quite some time. Anyone got any thoughts on what might be at play here? Kind regards. Gavin
Technical SEO | | IrishTimes0 -
Https indexed - though a no index no follow tag has been added
Hi, The https-pages of our booking section are being indexed by Google. We added But the pages are still being indexed. What can I do to exclude these URL's from the Google index? Thank you very much in advance! Kind regards, Dennis Overbeek ACSI Publishing | [email protected]
Technical SEO | | SEO_ACSI0