Googlebot soon to be executing javascript - Should I change my robots.txt?
-
This question came to mind as I was pursuing an unrelated issue and reviewing a site's robots/txt file.
Currently this is a line item in the file:
Disallow: https://* According to a recent post in the Google Webmasters Central Blog: [http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html](http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html "Understanding Web Pages Better") Googlebot is getting much closer to being able to properly render javascript. Pardon some ignorance on my part because I am not a developer, but wouldn't this require Googlebot be able to execute javascript? If so, I am concerned that disallowing Googlebot from the https:// versions of our pages could interfere with crawling and indexation because as soon as an end-user clicks the "checkout" button on our view cart page, everything on the site flips to https:// - If this were disallowed then would Googlebot stop crawling at that point and simply leave because all pages were now https:// ??? Or am I just waaayyyy over thinking it?...wouldn't be the first time! Thanks all! [](http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html "Understanding Web Pages Better")
-
Excellent answer. Thanks so much Doug. I really appreciate it! Adding a "nofollow" attribute to the Checkout button is a good suggestion and should be fairly easy to implement. I realize that internal nofollows are not normally recommended, but in this instance, may not be a bad idea.
-
Hi Dana,
When you click on the checkout button - what's the mechanism for taking people to the https:// site. Is it just that the checkout link uses https:// in it's link? Is there some javascript wizardry you're particularly concerned about?
Even though googlebot follows this one link to the https version of the cart, it will still have all the other links on the previous page queued up to follow (non-https) so I don't think this will stop the crawl at that point. It would be a nightmare if googlebot stopped crawling hte entire site everytime it went down a rabbit hole!
That's not to say that you wouldn't want to consider no-following your checkout button. I'm sure neither you, nor google want to the innards of the cart pages to be indexed? There's probably other pages you'd rather Googlebot spent it's time finding right?
My take on the Google blog about understanding Javascript is that the aim is to try and do a better job discovering content that might be hidden by Javascript/Ajax. It's a problem for google when the raw html that they're crawling doesn't accurately reflect the content that is displayed in front of a real visitor.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Have you ever seen or experienced a page indexed which is actually from a website which is blocked by robots.txt?
Hi all, We use robots file and meta robots tags for blocking website or website pages to block bots from crawling. Mostly robots.txt will be used for website and expect all the pages to not getting indexed. But there is a condition here that any page from website can be indexed by Google even the site is blocked from robots.txt; because crawler may find the page link somewhere on internet as stated here at last paragraph. I wonder if this really the case where some webpages have got indexed. And even we use meta tags at page level; do we need to block from robots.txt file? Can we use both techniques at a time? Thanks
Algorithm Updates | | vtmoz0 -
Homepage title tag: "Keywords for robots" vs "Phrases for users"
Hi all, We keep on listening and going through the articles that "Google is all about user" and people suggesting to just think about users but not search engine bots. I have gone through the title tags of all our competitors websites. Almost everybody directly targeted primary and secondary keywords and few more even. We have written a very good phrase as definite title tag for users beginning with keyword. But we are not getting ranked well comparing to the less optimised or backlinked websites. Two things here to mention is our title tag is almost 2 years old. Title tag begins with secondary keyword with primary keyword like "seo google" is secondary keyword and "seo" is primary keyword". Do I need to completely focus on only primary keyword to rank for it? Thanks
Algorithm Updates | | vtmoz0 -
404s in Google Search Console and javascript
The end of April, we made the switch from http to https and I was prepared for a surge in crawl errors while Google sorted out our site. However, I wasn't prepared for the surge in impossibly incorrect URLs and partial URLs that I've seen since then. I have learned that as Googlebot grows up, he'she's now attempting to read more javascript and will occasionally try to parse out and "read" a URL in a string of javascript code where no URL is actually present. So, I've "marked as fixed" hundreds of bits like /TRo39,
Algorithm Updates | | LizMicik
category/cig
etc., etc.... But they are also returning hundreds of otherwise correct URLs with a .html extension when our CMS system generates URLs with a .uts extension like this: https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.html
when it should be:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.uts Worst of all, when I look at them in GSC and check the "linked from" tab it shows they are linked from themselves, so I can't backtrack and find a common source of the error. Is anyone else experiencing this? Got any suggestions on how to stop it from happening in the future? Last month it was 50 URLs, this month 150, so I can't keep creating redirects and hoping it goes away. Thanks for any and all suggestions!
Liz Micik0 -
Big change to title tags in SERPs for me, anyone else?
Beginning today, when I search in incognito mode, Google is giving me extremely limited titles, and not really going off of the title tag. The results are horrible for users, and make me nervous as an SEO. Image attached below: wGG7QRp
Algorithm Updates | | WilliamKammer0 -
Adding the link masking directory to robots.txt?
Hey guys, Just want to know if you have any experience with this. Is it worthwhile blocking search engines from following the link masking directory.. (what i mean by this is the directory that holds the link redirectors to an affiliate site: example:
Algorithm Updates | | irdeto
mydomain.com/go/thislink goes to
amazon.com/affiliatelink I want to know if blocking the 'go' directory from getting crawled in robots.txt is a good idea or a bad idea? I am not using wordpress but rather a custom built php site where i need to manually decide on these things. i want to specifically know if this in any way violates guidelines for google. it doesn't change the custom experience because they know exactly where they will end up if they click on the link. any advice would be much appreciated.0 -
Changes in Google "Site:" Search Algorithm Over Time?
I was wondering if anyone has noticed changes in how Google returns 'site:' searches over the past few years or months. I remember being able to do a search such as "site:example.com" and Google would return a list of webpages where the order may have shown the higher page rank pages (due to link building, etc) first and/or parent category pages higher up in the list of the first page (if relevant) first (as they could have higher PR naturally, anyways). It seems that these days I can hardly find quality / target pages that have higher page rank on the first page of Google's site: search results. Is this just me... or has Google perhaps purposely scrambled the SERPS somewhat for site: searches to not give away their page ranking secrets?
Algorithm Updates | | OrionGroup1 -
Javascript hidden divs, links to anchor content
Hello, I am working on a web project that breaks up its sections by utilizing hidden divs shown via javascript activated through anchor links. http://www.janandtom.com/ First question: Is this SEO suicide? I have confirmed that the content is being indexed by searching for specific text but have been led to believe that hidden div content will be afforded a lower 'importance'. One suggestion has having the text as display:block and then hiding it on page load. Will this make a difference? Second: Is there any way to have Google index the anchored content by the specific anchor text? An example for the second question: If you search google right now for: buyers like to look at floorplans Tom & Jan You will get a link to: http://www.janandtom.com but I would rather it be: [http://www.janandtom.com/#Interactive Floorplans](http://www.janandtom.com/#Interactive Floorplans) Sorry if this is redundant or addressed before. I tried searching the questions but wasn't getting and definitive direction to go and this project is a little unique for me. Also, I'm just getting my feet we into this 'high-end' seo (new member of SEOMoz) so please bear with me. Any help would be greatly appreciated. Thanks!
Algorithm Updates | | MASSProductions0 -
New Algorithm changes
the real time news feed style in which we all use the internet these days. How do you think this new change is going to effect things ? “This is the result of them saying we need to find a way to more effectively get fresh content up,” said Danny Sullivan, editor of Search Engine Land and an industry expert. “It does help with the issue of people thinking, ‘Wow, if I need to find out about something breaking, I’ll go to Facebook or Twitter for that.’ ” Is google reacting to a massive loss of traffic volume from Facebook and Twitter ? I also ask the question would Facebook benifit from some form of built in search engine or would this never happen ??
Algorithm Updates | | onlinemediadirect0