Site: Query Question
-
Hi All,
Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.
I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.
What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.
When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.
Either I am doing something stupid or these numbers are completely backwards?
Any thoughts?
Thanks,
Ben
-
Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.
-
Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine
Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)
-
But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.
Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.
-
Best bet is to make sure all your urls are in your sitemap and then you get an exact count.
Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page
S
-
What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.
Though that's not your exact situation, it can help explain what's happening.
Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.
So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.
Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.
-
about 839,000 results.
-
Different data center perhaps - what about if you add in the "dp" query to the string?
-
I actually see 'about 897,000 results' for the search 'site:www.newark.com'.
-
Thanks Adrian,
I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.
Ben
-
This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Query on Not Set In Product List Performance in Google Analytics
Hi All, I have query for given below screenshot-1. What is Not Set here? For column no. 2 only purchase and revenue showing other column blank why? I have properly implemented enhance ecommerce via tag manager. And my product list impression, clicks all working fine for all categories now I don't know from where I am getting Not set - Please check screenshot-2. So what is Not set here? Thanks! QcBGT OCrEp
Reporting & Analytics | | Arnold30 -
Does subdomain (or sub sub domain) affect analytics data of root site?
We self-host our public website, but over time have also added subdomains onto it that are not public and are for internal or even client portals. I am seeking advice as to whether those subdomains affect the analytics data (self referrals, visits, bounces) of the public site that I am tasked with analyzing. I feel that it does skew the data but need to build a solid case to move the public website to a new domain, so as to leave the existing one in tact with all of its subs.
Reporting & Analytics | | MarketingGroup0 -
How Do Queries And Impressions Relate?
For one of our keywords, i have 2,500 impressions this past month, but there were only 1,300 queries according to Google's keyword planner. How can I have more impressions than queries? If anything, I thought it would be the other way around. If someone could flush this out for me, I'd be incredibly grateful. Thanks, Ruben
Reporting & Analytics | | KempRugeLawGroup0 -
Some questions on how to set up a multi-visit advanced segment in Google Analytics
Hi I would like some assistance / clarification on how to set up a user segment so that it can track user behavior over multiple visits. Basically I have a campaign set up and want to see conversions - even if they hit the site and then convert later on another visit. I've read that you can do this (over up to 30 days). So I start off by filtering TRAFFIC SOURCE - easy enough. But then I have to add under "advanced" correct? But then when I set the next filter to the GOAL I want, I only get "by session" and "by hit" as options. The blog post I read made it sound like only "by user" would then really do multi visits. Is "by user" only an e-commerce tracking option? (which I don't have set up) Is there another way/path to get the info I need? Thank you!
Reporting & Analytics | | yandl0 -
Weird Analytics Question
Looking at a Google Analytics report for a client - Traffic Sources - Referrals - Landing Pages from one particular referrer. This one referral site is a large trade directory that links onto several deep pages of the site, but also links onto the homepage. Analytics is showing the one landing page as //index.html. That's 2 // - not one. If you click on the link, it's a 404. I've never seen this in Analytics before. I'm looking at the client's info on this trade directory site and I can't see a link that points to this 404. The majority of incoming traffic from this site is apparently coming to this //index.html page, so you'd think it would be coming from their main profile on the site. But it's not there. Also, if you had all this referral traffic coming to a 404, you'd expect a really high bounce rate, but it's not - it's average. The client also has a sister site also listed in this directory, and I'm not seeing this same issue in their Analytics. Is this just some weird glitch in Analytics?
Reporting & Analytics | | stevefidelity0 -
Why would a website rank lower than weaker site?
Hi, Today I noticed that my website is ranking one place lower than a competitor in Google UK ,despite my site having a stronger domain authority and page authority. Is there a plausible reason for this, i'm slightly confused? Thanks,
Reporting & Analytics | | Benjamin3790 -
Duplicate Content From My Own Site?!
When I ran the SEO Moz report it says that I have a ton of duplicate content. The first one I looked at was my home page. http://www.kisswedding.com/ http://www.kisswedding.com/index.html http://kisswedding.com/index.html All of the above 3 have varying internal links, page authority, and link root domains. Only the first has any external links. All of the others only seem to have 1 other duplicate page. It's a difference between the www and the non-www version. I have a verified acct for www.kisswedding.com in google webmaster tools. The non-www version is in there too but has not been verified. Under settings for the verified account (www.kisswedding.com), "Don't set a preferred domain" is checked off. Is that my mistake. And if so, which should I select? The www version or the non-www version? Thanks!
Reporting & Analytics | | annasus0 -
Something strange going on with new client's site...
Please forgive my stupidity if there is something obvious here which I have missed (I keep assuming that must be the case), but any advice on this would be much appreciated. We've just acquired a new client. Despite having a site for plenty of time now they did not previously have analytics with their last company (I know, a crime!). They've been with us for about a month now and we've managed to get them some great rankings already. To be fair, the rankings weren't bad before us either. Anyway. They have multiple position one rankings for well searched terms both locally and nationally. One would assume therefore that a lot of their traffic would come from Google right? Not according to their analytics. In fact, very little of it does... instead, 70% of their average 3,000 visits per month comes from just one referring site. A framed version of their site which is through reachlocal, which itself doesn't rank for any of their terms. I don't get it... The URL of the site is: www.namgrass.co.uk (ignore there being a .com too, that's a portal as they cover other countries). The referring site causing me all this confusion is: http://namgrass.rtrk.co.uk/ (see source code at the bottom for the reachlocal thing). Now I know reach local certainly isn't sending them all that traffic, so why does GA say it is... and what is this reachlocal thing anyway?? I mean, I know what reachlocal is, but what gives here with regards to it? Any ideas, please??
Reporting & Analytics | | SteveOllington0