Robots file set up
-
The robots file looks like it has been set up in a very messy way.
I understand the # will comment out a line, does this mean the sitemap would
not be picked up?
Disallow: /js/ should this be allowed like /*.js$
Disallow: /media/wysiwyg/ - this seems to be causing alerts in webmaster tools as it can not access
the images within.
Can anyone help me clean this up please
#Sitemap: https://examplesite.com/sitemap.xml
Crawlers Setup
User-agent: *
Crawl-delay: 10Allowable Index
Mind that Allow is not an official standard
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Allow: /catalogsearch/result/
Allow: /media/catalog/
Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/Disallow: /media/
Disallow: /media/captcha/
Disallow: /media/catalog/
#Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: */catalog/product/upload/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+Paths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=
Disallow: /rss*
Disallow: /*PHPSESSIDDisallow: /:
Disallow: /User-agent: Fatbot
Disallow: /User-agent: TwengaBot-2.0
Disallow: / -
To add to this, I'd also recommend having a look around in /lib/ just to make sure you aren't blocking important javascript and css files (I've been bitten by this!).
More guidance here: https://developers.google.com/webmasters/mobile-sites/mobile-seo/common-mistakes/blocked-resources?hl=en
-
Looks like your intuitions are pretty good! I would remove the # before sitemap, as you have indicated. I would remove the line about /js/ as Google needs access to javascript these days and will throw a fit if you don't. I wouldnt worry about the wysiwyg directory if it only has images that you dont care about ranking.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Only Images Show Up in Log Files
Has anyone ever seen a log file analysis return only images and no actual page URLs?
Technical SEO | | LoganRay0 -
Duplicate title while setting canonical tag.
Hi Moz Fan, My websites - https://finance.rabbit.co.th/ has run financial service, So our main keywords is about "Insurance" in Thai, But today I have an issues regarding to carnonical tag. We have a link that containing by https://finance.rabbit.co.th/car-insurance?showForm=1&brand_id=9&model_id=18&car_submodel_id=30&ci_source_id=rabbit.co.th&car_year=2014 and setting canonical to this url - https://finance.rabbit.co.th/car-insurance within 5,000 items. But in this case I have an warning by site audit tools as Duplicate Page Title (Canonical), So is that possible to drop our ranking. What should we do, setting No-Index, No-Follow for all URL that begin with ? or keep them like that.
Technical SEO | | ASKHANUMANTHAILAND0 -
Can the Hosting location of image files have a negative effect if on the developers own media server rather than on client site server ?
Hi Can the Hosting location of image files have a negative effect if on the developers own media server as opposed to on the actual websites server ? In the case i'm looking at the image files are hosted on a totally separate server (a media subdomain of the developers site server) from the subject sites dedicated server. Will engines still attribute the properties of files hosted in this manner to the main website (such as file name or should they really be on the subject sites server own media folder ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
RegEx help needed for robots.txt potential conflict
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both: Allow: /*?p= and Disallow: /?p=& I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number? I've looked at several resources and there is practically no reference to what "&" does... Can anyone shed any light on this, to ensure I am allowing suitable access to a shop? Thanks in advance for any assistance
Technical SEO | | MSTJames0 -
Yoast plug in - title settings
Hi, I am using yoast plugin and having problems with title. For example, my recent post http://www.soobumimphotography.com/bulverde-realtor-headshot/
Technical SEO | | BistosAmerica
It's showing as "Bulverde Realtor Headshot | San Antonio Headshot PhotographerSan Antonio Wedding Photography Journal So basically, homepapge title is followed on every single page and post. I would like Bulverde Realtor Headshot | San Antonio Headshot Photographer Could you help with this?0 -
Base HREF set without HTTP. Will this cause search issues?
The base href has been set in the following format: <base href="//www.example.com/"> I am working on a project where many of the programming team don't believe that SEO has an impact on a website. So, we often see some strange things. Recently, they have rolled out an update to the website template that includes the base href I listed above. I found out about it when some of our tools such as Xenu link checker - suddenly stopped working. Google appears to be indexing the the pages fine and following the links without any issue - but I wonder if there is any long term SEO considerations to building the internal links in this manner? Thanks!
Technical SEO | | Nebraska0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
Way to find how many sites within a given set link to a specific site?
Hi, Does anyone have an idea on how to determine how many sites within a list of 50 sites link to a specific site? Thanks!
Technical SEO | | SparkplugDigital0