Indexing non-indexed content and Google crawlers
-
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website.
I've added a 'noindex' meta tag to articles that shouldn't be live until a future date.
When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page?
Thanks.
-
Wow! Nice detective work! I could see how that one would slip under the radar.
Congrats on finding a needle in a haystack!
You should buy yourself the adult beverage of your choice and have a little toast!
Cheers!
-
-
I think Screaming Frog has a trial version, I forget if it limits total number of pages etc. as we bought it a while ago. At least you can try out and see. May be others who have more tools as well.
-
Thanks. I agree I need to get rid of that noindex. The site is new and doesn't have much in the way of tag clouds etc. yet, so it's not like we have a lot of pages to check.
I've used the link: attribute to try and find the offending links each time, but nothing showed up. I use Xenu Link Sleuth rather than Screaming Frog, and I can't find a way to find backlinks with Xenu. Do you know if you can with the free version of Screaming Frog? I've seen the free version described as "almost fully functional" - the number of crawlable links seems to be the main restriction.
-
I like the automated sitemap answer for the cause (as this has bitten me before), but you mentioned you do not have that. I would still bet that somewhere on your web site you are linking to the page that you do not want indexed. It could be a tag cloud page or some other index page. We had a site that it would accidentally publish out articles on our home page ahead of schedule. Point here is that when you have a dynamic site with a CMS, you really have to be on your toes with stuff like this as the automation can get you into situations like this.
I would not use the noindex tag and remove it later. My concern would be that you are sending conflicting signals to Google. noindex tells good to remove this page from the index.
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it." from GWT
When I read that - it sounds like this is not what you want for this page.
You could also setup your system to show a 404 on the URL until the content is live and then let it 200, but you run into the same issue of Google getting 2 opposite signals on the same page. Either way, if you first give the signal to Google that you do not want something indexed, you are at the mercy of the next crawl to see if Google looks at it again.
Regardless, you need to get to the crux of the issue, how is Google finding this URL?
I would use a 3rd party spider tool. We have used Screaming Frog SEO Spider. There are others out there. You would be amazed what they find. The key to this tool is that when it finds something, it also tells you on what page it found it. We have big sites with thousands of pages and we have used it to find broken links to images and links to pages on our site that now 404. Really handy to clean things up. I bet it would find where there is a link on your site that contains the page (or pages) that link to the content. You can then update that page and not have to worry about using noindex etc. Also not that the spiders are much better than humans at finding this stuff. Even if you have looked, the spider looks at things differently.
It also may be as simple as searching for the URL on the web with the link: attribute. Google may show you where it is finding the link.
Good luck and please post back what you find. This is kind of like one of those "who dun it?" mystery shows!
-
There is no automated sitemap. We checked every page we could, including feeds.
-
Do you have an automated sitemap? On at least one occasion, I've found that to be a culprit.
Noindex means it won't be kept in the index. It doesn't mean it won't be crawled. I'm not sure how it would affect crawl timing , tho. I would assume that Google would assume that you would want things not indexed crawled less frequently. Something to potentially try is to use the GWT Fetch as Googlebot tool to force a new crawl of the page and see if that gets it in the index any faster.
http://googlewebmastercentral.blogspot.com/2011/08/submit-urls-to-google-with-fetch-as.html
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing? I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart1 -
Same content different URL - Google Analytics other Options
One of my clients came to me today asking to mirror their current website and host it on a different domain (i.e domain.com and domain.ca). Their reasoning is that there is no actual way to confirm via form entries from their website which visitors are submitting the form from an organic perspective and which ones are submitting the form through an Adwords ad campaign referral they are running (which I have nothing to do with). I know Google Analytics will show you visits to pages on a site and then you can find out which source (organic vs. cpc) they came from, but it won't confirm the source on the actual form entry. So my client feels the only way to find out this information would be to mirror the website so that the separate analytics would validate their Ad spend. Does anyone know of any tools that I could use for something like this? I do NOT want to mirror the website as I am fearful of duplicate content and the YEARS of organic SEO work I have put into this website going to waste. The other element I should mention is the client only wants to have this "mirorred" site up for 2 months. Any thoughts, suggestions and arguments for a mirorred website are welcomed! Thanks!
Intermediate & Advanced SEO | | MainstreamMktg0 -
What sort of content for 'non-niche' website?
Hey guys, had a question with regards to content production. We run an store called Yellow Octopus in Australia and we've literally got thousands of products (4500 skus last count). We've got everything from novelty mugs to kitchen accessories to gag gifts, t-shirts and tech gadgets. I've read a lot of material on creating awesome content to attract backlinks and we are ready to craft our content strategy. We've got a team in place - graphic designer, illustrator and writers to execute that strategy. It's just a matter of formulating the strategy! Largely speaking I have an idea of the quality of content required because I look at a lot of it. The real issue is what type of content is right for us? Most of the articles I have read focus on niche industries i.e. SEO, Piano sales or health foods. Right off the bat I can come up with hundreds of content pieces that work around those niches. However, with such a diverse range of products I'm unsure of what our niche really is, in fact not having a niche is almost our niche. Of course we could do gift guides like '30 Unbelievable Gifts for Foodies' (and we do, do those). However they aren't really the type of posts that are likely to attract back-links. Is the best strategy to split the content into categories? What sort of content pieces would you suggest for a company such as ours? Many thanks in advance!
Intermediate & Advanced SEO | | TheGreatestGoat0 -
Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags
Hi Moz Community, We have the following robots command that should prevent URLs with tracking parameters being indexed. Disallow: /*? We have noticed google has started indexing pages that are using tracking parameters. Example below. http://www.oakfurnitureland.co.uk/furniture/original-rustic-solid-oak-4-drawer-storage-coffee-table/1149.html http://www.oakfurnitureland.co.uk/furniture/original-rustic-solid-oak-4-drawer-storage-coffee-table/1149.html?ec=affee77a60fe4867 These pages are identified as duplicate content yet have the correct canonical tags: https://www.google.co.uk/search?num=100&site=&source=hp&q=site%3Ahttp%3A%2F%2Fwww.oakfurnitureland.co.uk%2Ffurniture%2Foriginal-rustic-solid-oak-4-drawer-storage-coffee-table%2F1149.html&oq=site%3Ahttp%3A%2F%2Fwww.oakfurnitureland.co.uk%2Ffurniture%2Foriginal-rustic-solid-oak-4-drawer-storage-coffee-table%2F1149.html&gs_l=hp.3..0i10j0l9.4201.5461.0.5879.8.8.0.0.0.0.82.376.7.7.0....0...1c.1.58.hp..3.5.268.0.JTW91YEkjh4 With various affiliate feeds available for our site, we effectively have duplicate versions of every page due to the tracking query that Google seems to be willing to index, ignoring both robots rules & canonical tags. Can anyone shed any light onto the situation?
Intermediate & Advanced SEO | | JBGlobalSEO0 -
Google displaying a content box above the listing link for top ranking listing in SERPs
Hi, In the attached Google SERP example the first listing below the paid search ads has a large box with a snippet of content from the relevant page then followed by the standard link. Does anyone know how you get Google to display a box like this in their SERPs? I checked the code on the page and there doesn't appear to be anything special about it such as any schema markup. It uses standard list code. Does this only appear for particular types of content or sites, such as medical content in this case? Is the content more likely to appear for lists? Does it only appear for high authority sites that Google has selected? We have a similar medical information based site and it would be great to try to get Google to display a similar box of content for some of our pages. Thanks. Damien ZmPJVSl.png
Intermediate & Advanced SEO | | james.harris0 -
Duplicate Content From Indexing of non- File Extension Page
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
Intermediate & Advanced SEO | | WebbyNabler0 -
What is better for google: keep old not visited content deeply in the website, or to remove it?
We have quite a lot of old content which is not visited anymore. Should we remove it and have a lot of 410 errors which will be reported in GWT? Or should we keep it and forget about it?
Intermediate & Advanced SEO | | bele0 -
Sitemap - % of URL's in Google Index?
What is the average % of links from a sitemap that are included in the Google index? Obviously want to aim for 100% of the sitemap urls to be indexed, is this realistic?
Intermediate & Advanced SEO | | stats440