Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Results Pages Blocked in Robots.txt?
Hi I am reviewing our robots.txt file. I wondered if search results pages should be blocked from crawling? We currently have this in the file /searchterm* Is it a good thing for SEO?
Intermediate & Advanced SEO | | BeckyKey0 -
Branding and Page Titles - Please Help
Hello, I have a question about page titles. How important is branding here? I'm not referring to the company name, but rather the terminology that's used as "branding language" for a company. For example, let's say that the it would be a good idea to target the keyword "Restaurant Coupons" based on search volume and competition. However, our branding adheres to the language "Dining Offers". Is it considered a bad idea to use "Restaurant Coupons" in the page title? Or is that considered inconsistent branding? Basically, I'm just trying to figure out the correct balance between the SEO value of words and adhering to a company's branding. Any help is appreciated! Thanks,
Intermediate & Advanced SEO | | atmosol
Nick1 -
HELP! How do I get Google to value one page over another (older) page that is ranking?
So I have a tactical question and I need mozzers. I'll use widgets as an example: 1- My company used to sell widgets exclusively and we built thousands of useful, branded unique pages that sell widgets. We have thousands of pages that are ranking for widgets.com/brand-widgets-for-sale. (These pages have been live for almost 2 years) 2- We've shifted our focus to now renting widgets. We have about 100 pages focused on renting the same branded widgets. These pages have unique content and photos and can be found at widgets.com/brand-widgets-for-rent. (These pages have been live for about 2-3 months) The problem is that when someone searches just for the brand name, the "for sale" pages dramatically outrank the "for rent" pages. Instead, I want them to find the "for rent" page. I don't want to redirect traffic from the "for sale" pages because someone might still be interested in buying (although as a company, we are super focused on renting). Solutions? "nofollow" the "for sale" pages with the idea that Google will stop indexing "for sale" and start valuing "for rent" over it? Remove "for sale" from sitemap. Help!!
Intermediate & Advanced SEO | | Vacatia_SEO0 -
Looking for SEO & design help
Our website is www.mosquitocurtains.com built by an amateur (me). Traffic has been on the decline, slipping from enviable rankings, higher bounce rates. Of course I have a modest budget but can't seem to sift through those that say, "Pay us a bundle and cross your fingers that we're any good."
Intermediate & Advanced SEO | | Kurtyj0 -
Long term strategy to retain link 'goodness', I need some help!
Hi, I have a few questions around the best approach to retain as much link juice / authority from transitioning multiple domains into 1 single domain over the next year or so. I have 2 similar websites (www.brandA.co.uk and www.brandB.co.uk) which I need to transition to a new website (www.brandC.co.uk) over the next 2 years. Both A&B are established and have there own brand value, brand C will be a new website. I need to start introducing the brand from website C onto A&B straight away and then eventually drop the brands from A&B and just be left with C. One idea I am considering is: www.brandA.co.uk becomes brandA.brandC.co.uk (brandA sits as a subdomain on brandC website) Ultimately over time I would drop the subdomain (brandA) and just be left with www.brandC.co.uk The other option is: www.brandA.co.uk becomes brandC.co.uk/brandA...with the same ultimate aim as above. In both above case the same would be done for brandB, either becoming a subdomain of a folder on brandC website What I need to know is what is the best way to first pass any SEO goodness from the websites for brandA and brandB to the intermediate solution of either brandA.brandC.co.uk or brandC.co.uk/brandA (I see this intermediate solution being in place for approx 2 years). And then how to transition the intermediate solution into just having brandC.co.uk Which solution will aid growing the SEO goodness on the final brandC.co.uk website? Does google see subdomains as part of the main domain and thus the main domain will benefit from any links going to the subdomain or is it better to always use /folders as google sees these as more part of one website? ...or is there another option that I haven't considered? I know it's rater confusing so please give me a shout if you want anymore info. Thanks James
Intermediate & Advanced SEO | | cewe0 -
Robots.txt error message in Google Webmaster from a later date than the page was cached, how is that?
I have error messages in Google Webmaster that state that Googlebot encountered errors while attempting to access the robots.txt. The last date that this was reported was on December 25, 2012 (Merry Christmas), but the last cache date was November 16, 2012 (http://webcache.googleusercontent.com/search?q=cache%3Awww.etundra.com/robots.txt&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a). How could I get this error if the page hasn't been cached since November 16, 2012?
Intermediate & Advanced SEO | | eTundra0 -
Can Bundling Products Help eCommerce SEO?
We currently have over 13,000 products on our site. SeoMoz reports many duplicate pages, which are items that are very similar (different size, application, sku, etc.). Would it be prudent to create a bundled product that has one page, one description, a set of images and a table with add to cart buttons for all of the different products on that page? (called a bundled product in Magento). Then create 301 redirects from all of the individual pages and categories to the relevant new bundled product.
Intermediate & Advanced SEO | | iJeep0 -
301 redirect help
Hey guys, I normally work in WordPress and just use a 301 redirect plugin. I bought a site and rather than maintain two similar ones have decided to redirect one to the other. I am having trouble with the .htaccess file. Here is an example. These are two redirects: redirect 301 /category/models/next/2
Intermediate & Advanced SEO | | DanDeceuster
redirect 301 /category/models I want both of these URLs to redirect to the same URL of the new site. However, the /category/models is the only one working. It redirects to the new page just fine. The /category/models/next/2 is redirecting to nearly the same URL on the new site, only it is adding /next/2 to the end and that is bringing up a 404. Why is it adding /next/2 to the new URL? How can I fix this? There are several doing this. Help appreciated!0