Blocking https from being crawled
-
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue
www.example.com will be my domain
In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login
If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff?
The redirect part is what I am questioning.
-
Correct once /login gets redirected to https://www.example.com/login all nav links etc are https
What I ended up doing was blocking /login in robots and now doing canonicals on https as well as nofollow the /login link that is in the nav that redirects
Willl see what happens now.
-
So, the "/login" page gets redirected to https: and then every link on that page goes secure and Google crawls them all? I think blocking the "/login" page is a perfectly good way to go here - cut the crawl path, and you'll cut most of the problem.
You could request removal of "/login" in Google Webmaster Tools, too. Sometimes, I find that Robots.txt isn't great at removing pages that are already indexed. I would definitely add the canonical as well, if it's feasible. Cutting the path may not cut the pages that have already been indexed with https:.
Sorry, I'd actually reverse that:
(1) Add the canonicals, and let Google sweep up the duplicates
(2) A few weeks later, block the "/login" page
Sounds counter-intuitive, but if you block the crawl path to the https: pages first, then Google won't crawl the canonical tags on those versions. Use canonical to clean up the index, and then block the page to prevent future problems.
-
Gotcha. Yea I commented above how I was going to add a canonical as well as a noindex in the meta but was curious how it handled the redirect that was happening.
thanks for your help
-
Yea I was going to nofollow the link in the nav and add a meta tag but was curious how the robots file would handle this since the url is a redirect.
Thanks for your input
-
The pages that are being crawled under https, are the same pages available under http as well ? If yes, can you just add a canonical tag on these pages to go to the http version. That should fix it. And if your login page is the entry point, your fix will help as well. But then as Rebekah said, what if somebody is linking to your https page. I would suggest you look into making a canonical tag on these pages to http if that makes sense and is doable.
-
You can disallow the https portion in robots.txt, but remember robots.txt isn't always a sure fire way of not getting an area of your site crawled. If you have other important content to crawl from the secured page, be careful you are not blocking robots from there.
If this is linked to other places on the web, and the link doesn't include no-follow, search engines may still crawl the page. Can you change the link in your navigation to no-follow as well? I would also add a meta noindex tag to the page itself, and a canonical tag to the https version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google crawl drop
the crawl request of my company site: https://www.dhgate.com/ has dropped nearly over 95%, from daily 6463599 requests to 476493 requests at 12:00am on 9th, Oct (GMT+8). This dramatic dropping trend not only showed in our GSC crawl stats report but also our company's own log report. We have no idea what’s going on. We want to know whether there is an update of google about crawlling, or is this the issue of our own site? If something is wrong with our site, in what aspects would you recommend us to check, analyze and accordingly optimize?
Technical SEO | | DHgate_20140 -
HTTPS for form pages?
I am creating a small business website for a friend in Recruitment. It’s very small and mainly just a shop window for the business. There’s no login area for the website, but there are two areas were users can enter information: General contact us form (giving email and phone number) Applying for a job (attaching a resume) The forms are using Ninja Forms – which I believe are secure in passing information. But am I missing anything? Do I need to make these pages https at all? I’m quite new to building sites from scratch. Thanks for your help
Technical SEO | | joberts0 -
Multi Domain SSL Certs re HTTPS migration
Hi How important is it that when migrating sites to HTTPS they have their own SSL certificates as opposed to choosing the much cheaper multi domain certificate options such as: https://www.namecheap.com/security/ssl-certificates/comodo/ev-multi-domain.aspx I have been told really should have 1 certificate per domain and people generally unsure about multi domain certsificates ? All Best Dan
Technical SEO | | Dan-Lawrence0 -
What do I need to do for HTTPS switch in Webmaster Tools?
My site is currently verified using a meta tag for both Google and Bing. Will I need to recreate the meta tag or will I be able to use the same one?
Technical SEO | | EcommerceSite1 -
Https redirect when certificate expired
Hi, How do we 301 an https version of a domain to a page on another website when the security certificate has run out? We have 301 redirected the http version but IT stuck on how to do the expired https. Thanks
Technical SEO | | Houses0 -
Google Webmaster Tool - Crawl Stats Query ?
Dear All, I have been looking at GWT Crawl Stats and wondering how should I be interrupting the crawl stats chart. AllI I see is 3 charts telling me a high , low and average for the below but I am wondering is there anything I really need to be looking for ?. Pages crawled per day Kilobytes downloaded per day Time spent downloading a page (in milliseconds) thanks Sarah
Technical SEO | | SarahCollins0 -
Https-pages still in the SERP's
Hi all, my problem is the following: our CMS (self-developed) produces https-versions of our "normal" web pages, which means duplicate content. Our it-department put the <noindex,nofollow>on the https pages, that was like 6 weeks ago.</noindex,nofollow> I check the number of indexed pages once a week and still see a lot of these https pages in the Google index. I know that I may hit different data center and that these numbers aren't 100% valid, but still... sometimes the number of indexed https even moves up. Any ideas/suggestions? Wait for a longer time? Or take the time and go to Webmaster Tools to kick them out of the index? Another question: for a nice query, one https page ranks No. 1. If I kick the page out of the index, do you think that the http page replaces the No. 1 position? Or will the ranking be lost? (sends some nice traffic :-))... thanx in advance 😉
Technical SEO | | accessKellyOCG0 -
Crawl Errors In Webmaster Tools
Hi Guys, Searched the web in an answer to the importance of crawl errors in Webmaster tools but keep coming up with different answers. I have been working on a clients site for the last two months and (just completed one months of link bulding), however seems I have inherited issues I wasn't aware of from the previous guy that did the site. The site is currently at page 6 for the keyphrase 'boiler spares' with a keyword rich domain and a good onpage plan. Over the last couple of weeks he has been as high as page 4, only to be pushed back to page 8 and now settled at page 6. The only issue I can seem to find with the site in webmaster tools is crawl errors here are the stats:- In sitemaps : 123 Not Found : 2,079 Restricted by robots.txt 1 Unreachable: 2 I have read that ecommerce sites can often give off false negatives in terms of crawl errors from Google, however, these not found crawl errors are being linked from pages within the site. How have others solved the issue of crawl errors on ecommerce sites? could this be the reason for the bouncing round in the rankings or is it just a competitive niche and I need to be patient? Kind Regards Neil
Technical SEO | | optimiz10