How can a Page indexed without crawled?
-
Hey moz fans,
In the google getting started guide it says**"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
"How can it happen, I dont really get the point.
Thank you -
Pleasure is all mine my friend. You are most welcome. Moz SEO community is an indispensable asset and weapon in any SEO's inventory in my opinion. We learn a great deal here while helping others. I am really thankful to each and everyone here on Moz community. Long live Moz and Mozzers. YOU ROCK!!
-
Ov man, you always come tome with great ideas I never thought about that .
Thank you very much stay rock! -
Yes, of course my friend, Google has to crawl the page to see the page-level meta robots tag but till date I have not seen any page in Google's index that has been blocked using the robots.txt file and page-level meta robots tag. Password protecting your .htaccess file would be an overkill if you just want Google not to index a page. If you want Google to remove any particular page from its index, you can get it done from webmaster tools account. Here you go for more: https://support.google.com/webmasters/answer/1663419?hl=en
Good Luck to you my friend.
Best regards,
Devanur Rafi
-
Thank you guyz,
Devanur You've got the point let me correct you at one point.
You can't say google that remove my index just using meta robots tag, because It can't read the meta tag till it crawl.
So only solution looks like .htaccess password protect.
Anyway thanks for your efforts. -
I'm also thinking site maps, but I'm not really sure if they trust them that much to list links in it that they haven't crawled.
-
Hi friend,
If a page has been blocked using Robots.txt file, then Google will not crawl and index the page from within the website but what if a reference of that page is found on a third-party website? In cases like this, link discovery will happen and the page will be indexed without a Description snippet and such pages will have the following text in the place of a description in the search results pages:
"A description for this result is not available because of this site's robots.txt – learn more"
So inorder to completely stop Google from crawling and indexing a page, you should should block the page by implementing, page-level meta robots tag.
Here you go for more: https://support.google.com/webmasters/answer/156449?hl=en
Please feel free to post back if you have any other queries in this regards.
Best regards,
Devanur Rafi
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap Indexed Pages, Google Glitch or Problem With Site?
Hello, I have a quick question about our Sitemap Web Pages Indexed status in Google Search Console. Because of the drastic drop I can't tell if this is a glitch or a serious issue. When you look at the attached image you can see that under Sitemaps Web Pages Indexed has dropped suddenly on 3/12/17 from 6029 to 540. Our Index status shows 7K+ indexed. Other than product updates/additions and homepage layout updates there have been no significant changes to this website. If it helps we are operating on the Volusion platform. Thanks for your help! -Ryan rou1zMs
Intermediate & Advanced SEO | | rrhansen0 -
Using "nofollow" internally can help with crawl budget?
Hello everyone. I was reading this article on semrush.com, published the last year, and I'd like to know your thoughts about it: https://www.semrush.com/blog/does-google-crawl-relnofollow-at-all/ Is that really the case? I thought that Google crawls and "follows" nofollowed tagged links even though doesn't pass any PR to the destination link. If instead Google really doesn't crawl internal links tagged as "nofollow", can that really help with crawl budget?
Intermediate & Advanced SEO | | fablau0 -
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | | Andy.Drinkwater0 -
Weird indexing problem - Can it be solved?
Hi Been building and optimising sites for 15 years and this is one of the hardest problems I ever came across. So any help would be very much appreciated. Here we go: For some mysterious reason this URL http://weekend.visitsweden.com/no/ has been indexed as http://weekend.visitsweden.com even if we tried all we can to correct it. The problem is that since the latter points to the first URL with a 301 it refuses to get any page rank. Also it does not get visible in Google at all. Just a recap of what we have tried so far: Add site to webmaster tools Add proper sitemap.xml Add 301 redirect to the correct URL An easy way to locate the problem is to search for the main content of the site. As you can see it returns the wrong URL and the correct URL does not even get listed. Again, any help is very much appreciated. Kind regards Fredrik
Intermediate & Advanced SEO | | Resultify0 -
Google can't access/crawl my site!
Hi I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings. [URL Errors: 1st photo] 8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up. The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages. After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked. Also when i go to WMT, and try to Fetch as Google the site, this is what i get: [Fetch as Google: 2nd photo] From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles). What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings? Thanks a lot
Intermediate & Advanced SEO | | granitgash
Granit FvhvDVR.png dKx3m1O.png0 -
Pages getting into Google Index, blocked by Robots.txt??
Hi all, So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
Intermediate & Advanced SEO | | bjs2010
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists= So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more." So we removed them all, and google removed them all, every single one. This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index?? I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages? Any ideas? thanks.0 -
Is it OK to Delete a Page and Move Content to a Another Page without 301 re-direct
I have a page "A" that I want to completely delete and move the written content from A" to page "B". Since I am deleting "A" (not keeping page) is it OK to upload the content from "A" to page "B" and search engines will give "B" credit for the unique content? Or, since the content has already once been indexed on "A", "B" may struggle to get full credit for this new unique content, even though page "A" is deleted?
Intermediate & Advanced SEO | | khi50 -
Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
Hi! I have pages within my forum where visitors can upload photos. When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed. The industry however is one that really leans on images and having the images in Google Image search is important to us. The url structure is like such: domain.com/community/photos/~username~/picture111111.aspx I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results. This would be something like this: User-agent: googlebot Disallow: /community/photos/ Can I disallow Googlebot specifically rather than just using User-agent: * which would then allow googlebot-image to pick up the photos? I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible? Thanks! Leona
Intermediate & Advanced SEO | | HD_Leona0