Site: Query Question
-
Hi All,
Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.
I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.
What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.
When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.
Either I am doing something stupid or these numbers are completely backwards?
Any thoughts?
Thanks,
Ben
-
Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.
-
Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine
Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)
-
But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.
Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.
-
Best bet is to make sure all your urls are in your sitemap and then you get an exact count.
Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page
S
-
What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.
Though that's not your exact situation, it can help explain what's happening.
Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.
So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.
Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.
-
about 839,000 results.
-
Different data center perhaps - what about if you add in the "dp" query to the string?
-
I actually see 'about 897,000 results' for the search 'site:www.newark.com'.
-
Thanks Adrian,
I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.
Ben
-
This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regular Expression Question
We are having a little trouble coming up with a goal that shows how many product pageviews we are getting but I need to exclude search results pageviews that (unfortunately) have the same URL structure. Because it's an outside CMS, we have not ability to change the URL architecture. Products are on these types of pages: https://porscheasheville.com/inventory/Porsche+Boxster+Asheville+North+Carolina+2016+Rhodium+Silver+Metallic+536911 https://porscheasheville.com/inventory/Audi+A4+2.0T+Premium+Plus+Asheville+North+Carolina+2015+Gray+638379 Search results pages have this URL structure: https://porscheasheville.com/inventory/new/ https://porscheasheville.com/inventory/?condition=new&make=Porsche&model=Boxster https://porscheasheville.com/inventory/used/ https://porscheasheville.com/inventory/?condition=used&model=A4+2.0T+Premium+Plus I am hoping to create a GA goal with regular expressions showing only the product pages and not allowing the search results pages show up. Here's what I have, it's not working - any regex experts out there who can help? /inventory/[new/][used/] Thanks as always MOZ friends!
Reporting & Analytics | | ReunionMarketing0 -
Google Analtyics during site redesign
Hi, We will be launching a new redesign for our website. There will be new URLs and navigation and almost everything (except for static pages like about and contact) will be different. The overwhelming opinion seems to say that it's important to keep the same Google Analytics profile. How can we compare the past URLs to the new ones if they are completely different. Does anyone have any experience in this? Did you create any segmentation? Thanks 🙂
Reporting & Analytics | | WSteven0 -
Tracking Clicks on a Global Header Across Multiple Sites
Hey All, A particular client has multiple websites and we're planning on implementing a global header across 15+ sites. I've been looking for a way to track the clicks on this global header across all sites (that is that they are summed up), what's the best way to go about this if I am using Google Analytics (I know Adobe site catalyst could do this no problem with some advanced tweaking), any ideas? I could do the general click tracking route and tag every link but that will only help me if I do that for each site (that being said, if the global header for all sites pulls from a single HTML, then tagging it would technically count all the clicks from all the sites, the only caveat being that I'd have to pick which Google analytics profile I'd want to track the header with). Thoughts? Thanks!
Reporting & Analytics | | EvansHunt0 -
Some questions on how to set up a multi-visit advanced segment in Google Analytics
Hi I would like some assistance / clarification on how to set up a user segment so that it can track user behavior over multiple visits. Basically I have a campaign set up and want to see conversions - even if they hit the site and then convert later on another visit. I've read that you can do this (over up to 30 days). So I start off by filtering TRAFFIC SOURCE - easy enough. But then I have to add under "advanced" correct? But then when I set the next filter to the GOAL I want, I only get "by session" and "by hit" as options. The blog post I read made it sound like only "by user" would then really do multi visits. Is "by user" only an e-commerce tracking option? (which I don't have set up) Is there another way/path to get the info I need? Thank you!
Reporting & Analytics | | yandl0 -
Multiple Site Errors Due to Forum
We are currently working with a website that is receiving multiple errors through their vbulletin forum. These errors include: duplicate meta descriptions duplicate title tags duplicates title tags and meta descriptions from the archives as well The forums do not drive a lot of traffic but people are active on there. Would you recommend no-indexing the entire forum plus the archives or is there a better solution for that?
Reporting & Analytics | | axzm0 -
Google Analytics Site Search to new sub-domain
Hi Mozzers, I'm setting up Google's Site Search on a website. However this isn't for search terms, this will be for people filling in a form and using the POST action to land on a results page. This is similar to what is outlined at http://support.google.com/analytics/bin/answer.py?hl=en&answer=1012264 ('<a class="zippy zippy-collapse">Setting Up Site Search for POST-Based Search Engines').</a> However my approach is different as my results appear on a sub-domain of the top level domain. Eg.. user is on www.domain.com/page.php user fills in form submits user gets taken to results.domain.com/results.php The issue is with the suggested code provided by Google as copied below.. Firstly, I don't use query strings on my results page so I would have to create an artificial page which shouldn't be a problem. But what I don't know is how the tracking will work across a sub-domain without the _gaq.push(['_setDomainName', '.domain.com']); code. Can this be added in? Can I also add Custom Variables? Does anyone have experience of using Site Search across a sub-domain perhaps to track quote form values? Many thanks!
Reporting & Analytics | | panini0 -
Google Analytics Queries
Hi we used a web company who put our analytics under their company name we have left them and set up a new tracking code not associated with them, but we still have access to the data and it is all contained under the same login, is it possible to merge the two tracking codes? so that we have seemless history. Also does anyone know if you can change the primary e-mail address (gmail account) to a corporate email account? Any help would be appreciated as we have researched both and from what we have found we can't merge the data and also we are stuck with the primary email address.
Reporting & Analytics | | loopylu030 -
Open Site Explorer Discrepency
Hi, I'm just starting to look into this tool as a result of the need to analyse some work that is being carried out on my site by an external SEO company that is charging absolute top dollar. I'm sure it's a simple answer, but looking at the attached image you will see that the total links is shown as 219 which is described as links from all sources. However the report that is on the page shows only 44 links which would appear to be a report based on the same criteria. What is the difference in the numbers please? Simon siteExplorer.jpg
Reporting & Analytics | | simonphumphries0