When does Google index a fetched page?
-
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page
I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
-
For those following, see this link where Ryan has provided some interesting answers regarding the cache and the site:www.. command
-
I'm going to post a question about the non-cached as upon digging I'm not finding an answer.
And, I'm reading where it seems to take a couple of days before indexing, but seeing something strange that makes it confusing:,
This page was cached a few days ago: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/wildwood/mo/all
The paragraphs wording content that starts with 'The Wildwood coupons page' was added as a test just 3 days ago and then I ran a fetch. When I do a Google search for phrases in it, it does show up in google results (like qjamba wildwood buried by the large national chains). So, it looks like it indexed the new content.
But if you search for wildwood qjamba restaurants cafes the result Google shows includes the word diners that is gone from the cached content (it was previously in the meta description tag)! But if you then search wildwood qjamba restaurants diners it doesn't come up! So, this seems to indicate that the algorithm was applied to the cached file, but that the DISPLAY by Google when the user does a search is still of older content that isn't even in the new cached file! Very odd.
I was thinking I could put changes on pages and test the effect on search results 1 or 2 days after fetching, but maybe it isn't that simple. Or maybe it is but is just hard to tell because of the timing of what Google is displaying.
I appreciate your feedback. I have H2 first on some pages because H1 was pretty big. I thought I read once that the main thing isn't if you start with H1 or H2 but that you never want to put an H1 after an H2.
I'm blocking the cut and paste just to make it harder for a copycat to pull the info. Maybe overkill though.
Thanks again, Ted
-
That's interesting because according to google own words:
Google takes a snapshot of each page examined as it crawls the web and caches these as a back-up in case the original page is unavailable. If you click on the "Cached" link, you will see the web page as it looked when we indexed it. The cached content is the content Google uses to judge whether this page is a relevant match for your query.
Source: http://www.google.com.au/help/features.html
If I look for that page using a fragment of the <title>(site:http://www.qjamba.com/ "Ferguson, MO Restaurant") I can find it, so it's in the index.</p> <p>Or maybe not, because if you search for this query <strong>"Ferguson, MO Restaurant" 19 coupons</strong> (bold part quotes included) you are not among the results. So it seems (I didn't know) that using site: is showing results which are not in the index... But I would ask in <a href="https://productforums.google.com/forum/#!forum/websearch">google search product forum</a>.</p> <p>As far as I know you can use meta tag to avoid archiving in google cache but your page doesn't have a googlebot meta tag. So <strong>I have no idea why is not showing</strong>.</p> <p>But if I was you I would dig further. By the way the html of these pages is quite weird, I didn't spend much time looking at it, but there's no H1, you are blocking cut&paste with js... Accessibility is a factor in google algo.</p></title>
-
Thanks.. That does help..
<<if 404="" you="" have="" a="" for="" the="" cache:="" command="" that="" page="" is="" not="" indexed,="" if="" searching="" content="" of="" using="" site:="" find="" different="" page,="" it="" means="" other="" indexed="" (and="" one="" possible="" explanation="" duplicate="" issue)="">></if>
THIS page gives a 404:
but site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all
Give ONLY that exact same page. How can that be?
-
I am not sure I understood your doubt but I will try to answer.
site://foo.com
is giving you a number of indexed page, is presumably the number of pages from that site in the index, it normally differs from page indexed count in GWT, so both are probably not all that accurate
site://foo.com "The quick brown fox jumps over the lazy dog"
searches among the indexed pages for that site the ones containing that precise sentence
webcache.googleusercontent.com/search?q=cache:https://foo.com/bar
check the last indexed version of a specific page
if you have a 404 for the cache: command that page is not indexed, if searching for the content of that page using site: you find a different page, it means that other page is indexed for that content (and one possible explanation for that is a duplicate content issue)
-
Thanks Massimiliano. I'll give you a 'good' answer here, and cross fingers that this next round will work. I still don't understand the timing on site:www , nor what page+features is all about. I thought site:www was supposed to be the method people use to see what is currently indexed.
-
"cache:" is the most update version in google index
if you fix the duplicate content next re-indexing will fix the duplicate content issue
-
I have a bigger problem than I realized:
I accidentally put duplicate content in my subcategory pages that was just meant for category pages. It's about 100-150 pages, and many of them have been crawled in the last few days. I have already changed the program so those pages don't have that content. Will I get penalized by Google-- de-indexed? Or should I be ok going forward because the next time they crawl it will be gone?
I'm going to start over with the fetching since I made that mistake but can you address the following just so when I get back to this spot I maybe understand better?:
1. When I type into the google searchbar lemay mo restaurant coupons smoothies qjamba
the description it gives is <cite class="_Rm">www.qjamba.com/restaurants-coupons/lemay/mo/smoothies</cite>The Lemay coupons page features both national franchise printable restaurant coupons for companies such as KFC, Long John Silver's, and O'Charlies and ...
BUT when I do a site:<cite class="_Rm">www.qjamba.com/restaurants-coupons/lemay/mo/smoothies</cite>it gives the description found in the meta description tag: www.qjamba.com/restaurants-coupons/.../smoothie...Traduci questa pagina Find Lemay all-free printable and mobile coupons for Smoothies, and more.
It looks like site:www does NOT always give the most recent indexed content since 'The Lemay coupons page...' is the content I added 2 days ago for testing! Maybe that's because Lemay was one of the urls that I inadvertently created duplicate content for.
2. Are ANY of the cache command, page+features command, or site:www supposed to be the most recent indexed content?
-
I am assuming it's duplicate, it can be de-indexed for other reasons and the other page is returned because has the same paragraphs in it. But if you ran a couple of crawling reports like moz/semrush etc.. And they signal these pages as duplicates it may be the issue.
-
thanks.
That's weird because doing the site: command separately for that first page for the /smoothies gives different content than for /all :
site:www.qjamba.com/restaurants-coupons/lemay/mo/smoothies
site:www.qjamba.com/restaurants-coupons/lemay/mo/all
But why would that 'page+features' command show the same description when the description in reality is different? This seems like a different issue than my op, but maybe it is related somehow--even if not I prob should still understand it.
-
Yes, one more idea, if you take the content of the page and you query your site for that content specifically like this:
You find a different page. Looks like those pages are duplicate.
Sorry for missing a w.
-
you are missing a w there. site:www and you have site:ww
That's why I'm so confused--it appears to be indexed from the past, they are in my dbase table with the date and time crawled -- right after the fetch --, and there is no manual penalty in webmaster tools.
Yet there is no sign it re-indexed after crawling 2 days ago now. I could resubmit (there are 15 pages I fetched), but I'm not expecting a different response and need to understand what is happening in order to use this approach to test SEO changes.
thanks for sticking with this. Any more ideas on what is happening?
-
Well, that's a http 404 status code, which means the page was not found, in other words it's not in google index.
Please note if you type site:ww.qjamba.com/restaurants-coupons/lemay/mo/all you find nothing see image below.
Again I would doubt your logs. You can also check GWT for any manual penalty you may have there.
-
Hi, thanks again.
this gives an error:
but the page exists, AND site:www.qjamba.com/restaurants-coupons/lemay/mo/all
has a result, so I'm not sure what a missing cache means in this case..
The log shows that it was crawled right after it was fetched but the result for site:... doesn't reflect the changes on the page. so it appears not to have been re-indexed yet, but why not in the cache?
-
You evidently mistyped the url to check, this is a working example:
If your new content is not there, it have not been indexed yet, if your logs says it was crawled two days ago I would start doubting the logs.
-
HI Massimiliano,
Thanks for your reply.
I'm getting an error in both FF and Chrome with this in the address bar. Have I misunderstood?
http://webcache.googleusercontent.com/search?q=cache:http://www.mysite.com/mypage
Is the command (assuming I can get it to work) supposed to show when the page was indexed, or last crawled?
I am storing when it crawls, but am wondering about the couple of days part, since it has been 2 days now and when I first did it it was re-indexing within 5 minutes a few days ago.
-
Open this url on any browser:
You can reasonably take that as the date when the page was last indexed.
You could also programmatically store the last google bot visit per page, just checking user-agent of page request. Or just analyze your web server logs to get that info out on a per page basis. And add a couple of days just to have a buffer (even google need a little processing time to generate its index).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Google Indexing Request - Typical Time to Complete?
In Google Search Console, when you request the (re) indexing of a fetched page, what's the average amount of time it takes to re-index and does it vary that much from site to site or are manual re-index request put in a queue and served on a first come - first serve basis despite the site characteristics like domain/page authority?
Intermediate & Advanced SEO | | SEO18050 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
HTTP Pages Indexed as HTTPS
My site used to be entirely HTTPS. I switched months ago so that all links in the pages that the public has access to are now http only. But I see now that when I do a site:www.qjamba.com, the results include many pages with https in the beginning (including the home page!), which is not what I want. I can redirect to http but that doesn't remove https from the indexing, right? How do I solve this problem? sample of results: Qjamba: Free Local and Online Coupons, coupon codes ... **<cite class="_Rm">https://www.qjamba.com/</cite>**One and Done savings. Printable coupons and coupon codes for thousands of local and online merchants. No signups, just click and save. Chicnova online coupons and shopping - Qjamba **<cite class="_Rm">https://www.qjamba.com/online-savings/Chicnova</cite>**Online Coupons and Shopping Savings for Chicnova. Coupon codes for online discounts on Apparel & Accessories products. Singlehop online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/singlehop</cite>Online Coupons and Shopping Savings for Singlehop. Coupon codes for online discounts on Business & Industrial, Service products. Automotix online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/automotix</cite>Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. Online Hockey Savings: Free Local Fast | Qjamba **<cite class="_Rm">www.qjamba.com/online-shopping/hockey</cite>**Find big online savings at popular and specialty stores on Hockey, and more. Hitcase online coupons and shopping - Qjamba **<cite class="_Rm">www.qjamba.com/online-savings/hitcase</cite>**Online Coupons and Shopping Savings for Hitcase. Coupon codes for online discounts on Electronics, Cameras & Optics products. Avanquest online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/avanquest</cite>Online Coupons and Shopping Savings for Avanquest. Coupon codes for online discounts on Software products.
Intermediate & Advanced SEO | | friendoffood0 -
Home page not being indexed
Hi Moz crew. I have two sites (one is a client's and one is mine). They are both Wordpress sites and both are hosted on WP Engine. They have both been set up for a long time, and are "on-page" optimized. Pages from each site are indexed, but Google is not indexing the homepage for either site. Just to be clear - I can set up and work on a Wordpress site, but am not a programmer. Both seem to be fine according to my Moz dashboard. I have Webmaster tools set up for each - and as far as I can tell (definitely not an exper in webmaster tools) they are okay. I have done the obvious and checked that the the box preventing Google from crawling is not checked, and I believe I have set up the proper re-directs and canonicals.Thanks in advance! Brent
Intermediate & Advanced SEO | | EchelonSEO0 -
To index or de-index internal search results pages?
Hi there. My client uses a CMS/E-Commerce platform that is automatically set up to index every single internal search results page on search engines. This was supposedly built as an "SEO Friendly" feature in the sense that it creates hundreds of new indexed pages to send to search engines that reflect various terminology used by existing visitors of the site. In many cases, these pages have proven to outperform our optimized static pages, but there are multiple issues with them: The CMS does not allow us to add any static content to these pages, including titles, headers, metas, or copy on the page The query typed in by the site visitor always becomes part of the Title tag / Meta description on Google. If the customer's internal search query contains any less than ideal terminology that we wouldn't want other users to see, their phrasing is out there for the whole world to see, causing lots and lots of ugly terminology floating around on Google that we can't affect. I am scared to do a blanket de-indexation of all /search/ results pages because we would lose the majority of our rankings and traffic in the short term, while trying to improve the ranks of our optimized static pages. The ideal is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages - but for some reason Google keeps choosing the internal search results page as the "better" page to rank for our targeted keywords. Can anyone advise? Has anyone been in a similar situation? Thanks!
Intermediate & Advanced SEO | | FPD_NYC0 -
Why isn't google indexing our site?
Hi, We have majorly redesigned our site. Is is not a big site it is a SaaS site so has the typical structure, Landing, Features, Pricing, Sign Up, Contact Us etc... The main part of the site is after login so out of google's reach. Since the new release a month ago, google has indexed some pages, mainly the blog, which is brand new, it has reindexed a few of the original pages I am guessing this as if I click cached on a site: search it shows the new site. All new pages (of which there are 2) are totally missed. One is HTTP and one HTTPS, does HTTPS make a difference. I have submitted the site via webmaster tools and it says "URL and linked pages submitted to index" but a site: search doesn't bring all the pages? What is going on here please? What are we missing? We just want google to recognise the old site has gone and ALL the new site is here ready and waiting for it. Thanks Andrew
Intermediate & Advanced SEO | | Studio330 -
Tool to calculate the number of pages in Google's index?
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0