Robots.txt
-
www.mywebsite.com**/details/**home-to-mome-4596
www.mywebsite.com**/details/**home-moving-4599
www.mywebsite.com**/details/**1-bedroom-apartment-4601
www.mywebsite.com**/details/**4-bedroom-apartment-4612 We have so many pages like this, we do not want to Google crawl this pages
So we added the following code to Robots.txt
User-agent: Googlebot
Disallow: /details/
This code is correct?
-
and this disallows everything in the /details/ folder so if there were some exceptions to the rule (some pages or sub folders in that folder) you would need to add some allow directives, or make a more specific disallow(s)
-
Looks correct. Great reccomendation by Chris. The * (wildcard) just means "all".
-
Looks good but you only want to exclude google from crawling those pages? If you want to exclude other search engines, you could use:
User-agent: *
instead of
User-agent: Googlebot
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I add my html sitemap to Robots?
I have already added the .xml to Robots. But should I also add the html version?
Technical SEO | | Trazo0 -
Duplicate content: using the robots meta tag in conjunction with the canonical tag?
We have a WordPress instance on an Apache subdomain (let's say it's blog.website.com) alongside our main website, which is built in Angular. The tech team is using Akamai to do URL rewrites so that the blog posts appear under the main domain (website.com/more-keywords/here). However, due to the way they configured the WordPress install, they can't do a wildcard redirect under htaccess to force all the subdomain URLs to appear as subdirectories, so as you might have guessed, we're dealing with duplicate content issues. They could in theory do manual 301s for each blog post, but that's laborious and a real hassle given our IT structure (we're a financial services firm, so lots of bureaucracy and regulation). In addition, due to internal limitations (they seem mostly political in nature), a robots.txt file is out of the question. I'm thinking the next best alternative is the combined use of the robots meta tag (no index, follow) alongside the canonical tag to try to point the bot to the subdirectory URLs. I don't think this would be unethical use of either feature, but I'm trying to figure out if the two would conflict in some way? Or maybe there's a better approach with which we're unfamiliar or that we haven't considered?
Technical SEO | | prasadpathapati0 -
Robots.txt - What is the correct syntax?
Hello everyone I have the following link: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I want to prevent google from indiexing everything that is related to "view=send_friend" The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort. My problem is how i disallow it correctly via robots.txt I tried this syntax: Disallow: /view=send_friend/ However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report. What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
Technical SEO | | teleman0 -
Magento Robots & overly dynamic URL-s
How can i block all URL-s on a Magento store that have 2 or more dynamic parameters in it, since all the parameters have attribute name in it and not some uniform ID Would something like: Disallow: /?&* work? Since the only thing that is constant throughout all the custom parameters is that they are separated with "&" Thanks 🙂
Technical SEO | | tilenkrivec0 -
How does robots.txt affect aliased domains?
Several of my sites are aliased (hosted in subdirectories off the root domain on a single hosting account, but visible at www.theSubDirectorySite.com) Not ideal, I know, but that's a different issue. I want to block bots from viewing those files that are accessible in subdirectories on the main hosting account, www.RootDomain.com/SubDirectorySite/, and force the bots to look at www.SubDirectorySite.com instead. I utilized the canonical meta tag to point bots away from the sub directory site, but I am wondering what will happen if I use robots.txt to block those files from within the root domain. Will the bots, specifically Google bot, still index the site at its own URL, www.AnotherSite.com even if I've blocked that directory with Disallow: /AnotherSite/ ? THANK YOU!!!
Technical SEO | | michaelj_me0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220