Is it possible that Google may have erroneous indexing dates?
-
I am consulting someone for a problem related to copied content. Both sites in question are WordPress (self hosted) sites. The "good" site publishes a post. The "bad" site copies the post (without even removing all internal links to the "good" site) a few days after.
On both websites it is obvious the publishing date of the posts, and it is clear that the "bad" site publishes the posts days later. The content thief doesn't even bother to fake the publishing date.
The owner of the "good" site wants to have all the proofs needed before acting against the content thief. So I suggested him to also check in Google the dates the various pages were indexed using Search Tools -> Custom Range in order to have the indexing date displayed next to the search results.
For all of the copied pages the indexing dates also prove the "bad" site published the content days after the "good" site, but there are 2 exceptions for the very 2 first posts copied.
First post:
On the "good" website it was published on 30 January 2013
On the "bad" website it was published on 26 February 2013
In Google search both show up indexed on 30 January 2013!Second post:
On the "good" website it was published on 20 March 2013
On the "bad" website it was published on 10 May 2013
In Google search both show up indexed on 20 March 2013!Is it possible to be an error in the date shown in Google search results?
I also asked for help on Google Webmaster forums but there the discussion shifted to "who copied the content" and "file a DMCA complain". So I want to be sure my question is better understood here.
It is not about who published the content first or how to take down the copied content, I am just asking if anybody else noticed this strange thing with Google indexing dates.How is it possible for Google search results to display an indexing date previous to the date the article copy was published and exactly the same date that the original article was published and indexed?
-
Thanks Doug. Really an eye-opener.
-
Thanks Doug for your response. It really cleared up the questions I had about that date Google shows next to the search results.
I was not able to find official details about it, all I was able to find was different referencing as the indexing date of a page.
But I knoew here in the MOZ community there are people who really know things, that's why I asked.
So that date is just Google's estimation of the publishing date, not the date Google indexed the content!
Thanks again for taking the time to answer me!
-
Hiya Sorina,
When you use the custom date range, Google isn't listing results based on the date they were indexed. Google is using an estimated publication date.
Google tries to estimate the the publication date based on meta-data and other features of the page such as dates in the content, title and URL. The date Google first indexed the page is just one of the things that Google can use to estimate the publication date.
I also suspect that dates in any sitemap.xml files will also be taken into consideration.
But, given that even Google can't guarantee that it'll crawl and index articles on the day they've been published the crawl data may not be an accurate estimate.
Also, if the scraped content is being re-published with intact internal links (are these the full URL - do you they resolve to your original website?) then it's pretty obvious where the content came from.
Hope this help answer your question.
-
Hi Sorina,
I can tell you that the index dates shown by Google are accurate but is not the case with the Cache date sometimes as the date shown in the Cache and the copy shown in the cache don't match many times but the index dates are accurate. Send me a private message with the actual URLs under discussion and I will be able to comment with more clarity.
Best,
Devanur Rafi
-
Thank you for your response Devanur Rafi, but the "good" site doesn't have problems getting indexed.
Actually all posts on the "good" site are indexed the very same day they are published.My question was more about the indexing date shown in Google search results
How come, for a post from the "bad" site, Google is displaying an indexing date previous to the actual date the post was published on that site?!
And how come this date is exactly the same as the date Google says it indexed the post from the "good" site?
-
Hi Sorina,
This is a common thing and it all depends on a site's crawlability (how easy is it to crawl for the bot) and crawl frequency for that site. Google would have picked up that post first on the bad site and then from the good site. However, just because one or two posts were picked up late does not mean that the good site is not crawler friendly. It also depends on how far the resource is from the root. Let us take an example:
A page on a good site: abc.com/folder1/folder2/folder3/page.html
Now a bad site copies that page: xyz.com/page.html
In this case, Google might first pickup the copied page from the bad site as it is just a click away from the root which is not the case with the good site where the page is nested deep inside multiple folders.
You can also give the way back machine (archive.org) a try to find which website published the post first. Sometimes this might work out pretty well. You can also try to look at the cache dates of the posts on both the sites in Google to get some info in this regard.
Hope those help. I wish you good luck.
Best,
Devanur Rafi.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Cache
So, when I gain a link I always check to see if the page that is linking is in the Google cache. I've noticed recently that more and more pages are actually not showing up in Google's cache, yet still appear in search results. I did read an article from someone whoo works at Google a few weeks back that there is sometimes an error with the cache and occasionally the cache will not display. This week, my own website isn't showing up in the cache yet I'm still ranking in SERP's. I'm not worried about it, mostly whitehat, but has there been any indication that Google are phasing out the ability to check cache's of websites?
Algorithm Updates | | ThorUK0 -
What happens when a de-indexed subdomain is redirected to another de-indexed subdomain? What happens to the link juice?
Hi all, We are planning to de-index and redirect a sub domain A to sub domain B. Consequently we now need to d-index sub domain B also. What happens now to the link juice or page rank they gained from hundreds and thousands of backlinks? Will there be any ranking impact on main domain? Backlinks of these sub domains are not much relevant to main domain content. Thanks
Algorithm Updates | | vtmoz1 -
When Google is going to penalise webpages with autoplaying videos?
Hi, One of the most disgusting experiences while browsing web pages is the auto playing anonymous and highly non relevant ad videos at some corner of the page which buffer mostly and automatically plays. There are familiar websites which been following this activity. I don't think Google has ever rolled out an update to penalise these. Any idea why? Thanks
Algorithm Updates | | vtmoz0 -
Google indexing https sites by default now, where's the Moz blog about it!
Hello and good morning / happy Friday! Last night an article from of all places " Venture Beat " titled " Google Search starts indexing and letting users stream Android apps without matching web content " was sent to me, as I read this I got a bit giddy. Since we had just implemented a full sitewide https cert rather than a cart only ssl. I then quickly searched for other sources to see if this was indeed true, and the writing on the walls seems to indicate so. Google - Google Webmaster Blog! - http://googlewebmastercentral.blogspot.in/2015/12/indexing-https-pages-by-default.html http://www.searchenginejournal.com/google-to-prioritize-the-indexing-of-https-pages/147179/ http://www.tomshardware.com/news/google-indexing-https-by-default,30781.html https://hacked.com/google-will-begin-indexing-httpsencrypted-pages-default/ https://www.seroundtable.com/google-app-indexing-documentation-updated-21345.html I found it a bit ironic to read about this on mostly unsecured sites. I wanted to hear about the 8 keypoint rules that google will factor in when ranking / indexing https pages from now on, and see what you all felt about this. Google will now begin to index HTTPS equivalents of HTTP web pages, even when the former don’t have any links to them. However, Google will only index an HTTPS URL if it follows these conditions: It doesn’t contain insecure dependencies. It isn’t blocked from crawling by robots.txt. It doesn’t redirect users to or through an insecure HTTP page. It doesn’t have a rel="canonical" link to the HTTP page. It doesn’t contain a noindex robots meta tag. It doesn’t have on-host outlinks to HTTP URLs. The sitemaps lists the HTTPS URL, or doesn’t list the HTTP version of the URL. The server has a valid TLS certificate. One rule that confuses me a bit is : **It doesn’t redirect users to or through an insecure HTTP page. ** Does this mean if you just moved over to https from http your site won't pick up the https boost? Since most sites in general have http redirects to https? Thank you!
Algorithm Updates | | Deacyde0 -
We recently transitioned a site to our server, but Google is still showing the old server's urls. Is there a way to stop Google from showing urls?
We recently transitioned a site to our server, but Google is still showing the old server's urls. Is there a way to stop Google from showing urls?
Algorithm Updates | | Stamats0 -
How Do I Optimize with Google's Video Search?
Hi everyone, I am looking here https://developers.google.com/webmasters/videosearch/schema and I don't fully understand. Could someone please explain, step by step, what I have to do to optimize for Google video search? I.e. Step 1 do this Step 2 do this. I don't fully understand Thank you!
Algorithm Updates | | jhinchcliffe0 -
7 Pack Google Serps?
What is the best way to get into the 7 pack of google serps? I have a site that ranked well before this changed but not was pushed back to page 2. I have Unique content and I currently have provided my info to all the standard local sites, like Yelp, Manta, Local.com and others. I already have a Google Local page and I also have links from local sites. What else can be done?
Algorithm Updates | | bronxpad0 -
Why does Google say they have more URLs indexed for my site than they really do?
When I do a site search with Google (i.e. site:www.mysite.com), Google reports "About 7,500 results" -- but when I click through to the end of the results and choose to include omitted results, Google really has only 210 results for my site. I had an issue months back with a large # of URLs being indexed because of query strings and some other non-optimized technicalities - at that time I could see that Google really had indexed all of those URLs - but I've since implemented canonical URLs and fixed most (if not all) of my technical issues in order to get our index count down. At first I thought it would just be a matter of time for them to reconcile this, perhaps they were looking at cached data or something, but it's been months and the "About 7,500 results" just won't change even though the actual pages indexed keeps dropping! Does anyone know why Google would be still reporting a high index count, which doesn't actually reflect what is currently indexed? Thanks!
Algorithm Updates | | CassisGroup0