Robots.txt file question? NEver seen this command before
-
Hey Everyone!
Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant).
the command line is as follows:
Disallow: /*?*
I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me
Any help would be greatly appreciated!
Thanks, Rob
-
I don't think this is correct.
? is an attempt at using a RegEx in Robots file which I don't think works.
Further, if it was a properly formed regex, it would be ?
- is a special character for the user agent to mean all. For the disallow line, I believe you have to use a specific directory or page.
http://www.robotstxt.org/robotstxt.html
I could be wrong, but the info on this site has been my understanding from the past too.
-
It depends on how your site is structured.
For example if you have a page at
http://www.yourdomain.com/products.php
and this shows different things based on the parameter, like:
http://www.yourdomain.com/products.php?type=widgets
You will want to get rid of this line in your robots.txt
However if the parameter(s) doesn't change the content on the page, you can leave it in.
-
Thanks Ryan and Ryan! I'm just unfamiliar with this command set in the robots file, and getting settled into the company (5 weeks).. so I am still learning the site's structure and arch. With it all being new to me with limitations I am seeing from the CMS side, I was wondering if this might have been causing crawl issues for Bing and or Yahoo... I'm trying to gauge where we might be experiencing problems with the sites crawl functions.
-
Its not a bad idea in the robots.txt, but unless you are 100% confidant that you wont block something that you really want, i would consider just handling unwanted parameters and pages through the new Google Webmaster url handling toolset. that way you have more control over which ones do and dont get blocked.
-
So, for this parameter, should I keep it in the robots file?
-
Its preventing spiders from crawling pages with parameters in the URL. For example when you search on google you'll see a URL like so:
http://www.google.com/search?q=seo
This passes the parameter of q with a value of 'seo' to the page at google.com for it to work its magic with. This is almost definitely a good thing, unless the only way to access some content on your site is via URL parameters.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Last Question Regarding URLs - I Promise...
Hello I've recently asked the community which urls would be best for a company with a variety of wood flooring products. This question relates to "keywords" within the url which relates to each and every product. Which would you choose, 1. a or b? 2. a or b? 1. - Product: CIRO a. www.thewoodgalleries.co.uk/engineered-flooring/rustic-oak-ciro - Keyword Match, YES. "Rustic Oak Flooring" b. www.thewoodgalleries.co.uk/engineered-flooring/ciro - Keyword Match, NO. "Rustic Oak Flooring" 2. - Product: VOGUE a. www.thewoodgalleries.co.uk/engineered-flooring/prefinished-oak-vogue - Keyword Match, YES. "Pr_efinished Oak Flooring"_ b. www.thewoodgalleries.co.uk/engineered-flooring/vogue - Keyword Match, NO. "Pr_efinished Oak Flooring"_ Although seemingly a basic part of SEO, I find myself revisiting this question time and time again - what is really better for SEO? Shorter URL's or "slightly" longer to achieve keyword match? _After researching many keywords which we have chosen to use as part of this project, it seems to have any chance of ranking on the first page, the key word (or part of the keyword) must appear within the url. _ I would like to get some "extra" clarification. Thanks for your help!
Technical SEO | | GaryVictory0 -
Robots.txt and Magento
HI, I am working on getting my robots.txt up and running and I'm having lots of problems with the robots.txt my developers generated. www.plasticplace.com/robots.txt I ran the robots.txt through a syntax checking tool (http://www.sxw.org.uk/computing/robots/check.html) This is what the tool came back with: http://www.dcs.ed.ac.uk/cgi/sxw/parserobots.pl?site=plasticplace.com There seems to be many errors on the file. Additionally, I looked at our robots.txt in the WMT and they said the crawl was postponed because the robots.txt is inaccessible. What does that mean? A few questions: 1. Is there a need for all the lines of code that have the “#” before it? I don’t think it’s necessary but correct me if I'm wrong. 2. Furthermore, why are we blocking so many things on our website? The robots can’t get past anything that requires a password to access anyhow but again correct me if I'm wrong. 3. Is there a reason Why can't it just look like this: User-agent: * Disallow: /onepagecheckout/ Disallow: /checkout/cart/ I do understand that Magento has certain folders that you don't want crawled, but is this necessary and why are there so many errors?
Technical SEO | | EcomLkwd0 -
301 Re Direct Question for www
Can smeone check this code to make sure it is right so thaqt my site uses www always. RewriteCond %{HTTP_HOST} ^exercisebiology.com [NC] RewriteRule ^(.*)$ http://www.exercisebiology.com/$1 [R=301,L] I had the hostgators customer service personnel perform this. But I cannot get it to redirect. But he says it works.
Technical SEO | | anoopbal0 -
Blog question
If i set up a blog like this - http://www.abccompany.com/blog ? ( in a folder ), will each link to http://www.abccompany.com/blog carry more value to the main site than if the blog were set up like this- http://www.blog.abccompany.com
Technical SEO | | seoug_20050 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
Robots.txt questions...
All, My site is rather complicated, but I will try to break down my question as simply as possible. I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this: # /robots.txt file for http://webcrawler.com/
Technical SEO | | Horizon
# mail [email protected] for constructive criticism **User-agent: ***
Disallow: /_system/ I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this: **User-agent: ***
Disallow: /ControlPanel/ Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this: # /robots.txt file for http://webcrawler.com/
# mail [email protected] for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/ Or, like this: # /robots.txt file for http://webcrawler.com/
# mail [email protected] for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/ Thanks in advance. Matt0 -
Site Structure question
when deciding the Site structure for a e-commerce site Is it better to keep everything mysite.com/widget.html or use categories like mysite.com/Gifts/widget.html
Technical SEO | | DavidKonigsberg0 -
Question about domain redirects
One of my clients has an odd domain redirect situation. See if you can get your head round this: Domain A is set-up as a domain alias of Domain B Entering domain A or domain B takes you to default.asp on domain B. The default.asp includes VB script to check the HTTP_HOST variable. It checks whether the main doman name for domain A is present in the HTTP_HOST and if so redirects it to domain A/sub-folder/index.htm. If not present it redirects to domain B/index.htm. In both cases the redirect uses a response.Redirect clause. I think what is trying to be achieved is to redirect requests to Domain A to a sub-folder of Domain B. It works but seems extremely convoluted. Can anyone see problems with this set-up? Will link juice be lost along the redirect paths?
Technical SEO | | bjalc20110