Indexing non-indexed content and Google crawlers
-
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website.
I've added a 'noindex' meta tag to articles that shouldn't be live until a future date.
When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page?
Thanks.
-
Wow! Nice detective work! I could see how that one would slip under the radar.
Congrats on finding a needle in a haystack!
You should buy yourself the adult beverage of your choice and have a little toast!
Cheers!
-
-
I think Screaming Frog has a trial version, I forget if it limits total number of pages etc. as we bought it a while ago. At least you can try out and see. May be others who have more tools as well.
-
Thanks. I agree I need to get rid of that noindex. The site is new and doesn't have much in the way of tag clouds etc. yet, so it's not like we have a lot of pages to check.
I've used the link: attribute to try and find the offending links each time, but nothing showed up. I use Xenu Link Sleuth rather than Screaming Frog, and I can't find a way to find backlinks with Xenu. Do you know if you can with the free version of Screaming Frog? I've seen the free version described as "almost fully functional" - the number of crawlable links seems to be the main restriction.
-
I like the automated sitemap answer for the cause (as this has bitten me before), but you mentioned you do not have that. I would still bet that somewhere on your web site you are linking to the page that you do not want indexed. It could be a tag cloud page or some other index page. We had a site that it would accidentally publish out articles on our home page ahead of schedule. Point here is that when you have a dynamic site with a CMS, you really have to be on your toes with stuff like this as the automation can get you into situations like this.
I would not use the noindex tag and remove it later. My concern would be that you are sending conflicting signals to Google. noindex tells good to remove this page from the index.
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it." from GWT
When I read that - it sounds like this is not what you want for this page.
You could also setup your system to show a 404 on the URL until the content is live and then let it 200, but you run into the same issue of Google getting 2 opposite signals on the same page. Either way, if you first give the signal to Google that you do not want something indexed, you are at the mercy of the next crawl to see if Google looks at it again.
Regardless, you need to get to the crux of the issue, how is Google finding this URL?
I would use a 3rd party spider tool. We have used Screaming Frog SEO Spider. There are others out there. You would be amazed what they find. The key to this tool is that when it finds something, it also tells you on what page it found it. We have big sites with thousands of pages and we have used it to find broken links to images and links to pages on our site that now 404. Really handy to clean things up. I bet it would find where there is a link on your site that contains the page (or pages) that link to the content. You can then update that page and not have to worry about using noindex etc. Also not that the spiders are much better than humans at finding this stuff. Even if you have looked, the spider looks at things differently.
It also may be as simple as searching for the URL on the web with the link: attribute. Google may show you where it is finding the link.
Good luck and please post back what you find. This is kind of like one of those "who dun it?" mystery shows!
-
There is no automated sitemap. We checked every page we could, including feeds.
-
Do you have an automated sitemap? On at least one occasion, I've found that to be a culprit.
Noindex means it won't be kept in the index. It doesn't mean it won't be crawled. I'm not sure how it would affect crawl timing , tho. I would assume that Google would assume that you would want things not indexed crawled less frequently. Something to potentially try is to use the GWT Fetch as Googlebot tool to force a new crawl of the page and see if that gets it in the index any faster.
http://googlewebmastercentral.blogspot.com/2011/08/submit-urls-to-google-with-fetch-as.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has discovered a URL but won't index it?
Hey all, have a really strange situation I've never encountered before. I launched a new website about 2 months ago. It took an awfully long time to get index, probably 3 weeks. When it did, only the homepage was indexed. I completed the site, all it's pages, made and submitted a sitemap...all about a month ago. The coverage report shows that Google has discovered the URL's but not indexed them. Weirdly, 3 of the pages ARE indexed, but the rest are not. So I have 42 URL's in the coverage report listed as "Excluded" and 39 say "Discovered- currently not indexed." When I inspect any of these URL's, it says "this page is not in the index, but not because of an error." They are listed as crawled - currently not indexed or discovered - currently not indexed. But 3 of them are, and I updated those pages, and now those changes are reflected in Google's index. I have no idea how those 3 made it in while others didn't, or why the crawler came back and indexed the changes but continues to leave the others out. Has anyone seen this before and know what to do?
Intermediate & Advanced SEO | | DanDeceuster0 -
Why do SEO agencies ask for access to our Google Search Console and Google Tag Manager?
What do they need GTM for? And what is the use case for setting up Google Search Console?
Intermediate & Advanced SEO | | NBJ_SM0 -
Google indexing only 1 page out of 2 similar pages made for different cities
We have created two category pages, in which we are showing products which could be delivered in separate cities. Both pages are related to cake delivery in that city. But out of these two category pages only 1 got indexed in google and other has not. Its been around 1 month but still only Bangalore category page got indexed. We have submitted sitemap and google is not giving any crawl error. We have also submitted for indexing from "Fetch as google" option in webmasters. www.winni.in/c/4/cakes (Indexed - Bangalore page - http://www.winni.in/sitemap/sitemap_blr_cakes.xml) 2. http://www.winni.in/hyderabad/cakes/c/4 (Not indexed - Hyderabad page - http://www.winni.in/sitemap/sitemap_hyd_cakes.xml) I tried searching for "hyderabad site:www.winni.in" in google but there also http://www.winni.in/hyderabad/cakes/c/4 this link is not coming, instead of this only www.winni.in/c/4/cakes is coming. Can anyone please let me know what could be the possible issue with this?
Intermediate & Advanced SEO | | abhihan0 -
Huge Google index on E-commerce site
Hi Guys, Refering back to my original post I would first like to thank you guys for all the advice. We implemented canonical url's all over the site and noindexed some url's with robots.txt and the site already went from 100.000+ url's indexed to 87.000 urls indexed in GWT. My question: Is there way to speed this up?
Intermediate & Advanced SEO | | ssiebn7
I do know about the way to remove url's from index (with noindex of robots.txt condition) but this is a very intensive way to do so. I was hoping you guys maybe have a solution for this.. 🙂0 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
Copying my Facebook content to website considered duplicate content?
I write career advice on Facebook on a daily basis. On my homepage users can see the most recent 4-5 feeds (using FB social media plugin). I am thinking to create a page on my website where visitors can see all my previous FB feeds. Would this be considered duplicate content if I copy paste the info, but if I use a Facebook social media plugin then it is not considered duplicate content? I am working on increasing content on my website and feel incorporating FB feeds would make sense. thank you
Intermediate & Advanced SEO | | knielsen0 -
How can you indexed pages or content on pages that are behind a pay wall or subscription login.
I have a client that has a boat of awesome content they provide to their client that's behind a pay wall ( ie: paid subscribers can only access ) Any suggestions mozzers? How do I get those pages index? Without completely giving away the contents in the front end.
Intermediate & Advanced SEO | | BizDetox0 -
Why will google not index my pages?
About 6 weeks ago we moved a subcategory out to becomne a main category using all the same content. We also removed 100's of old products and replaced these with new variation listings to remove duplicate content issues. The problem is google will not index 12 critcal pages and our ranking have slumped for the keywords in the categories. What can i do to entice google to index these pages?
Intermediate & Advanced SEO | | Towelsrus0