The "webmaster" disallowed all ROBOTS to fight spam! Help!!
-
One of the companies I do work for has a magento site. I am simply the SEO guy and they work the website through some developers who hold access to their systems VERY tightly. Using Google Webmaster Tools I saw that the robots.txt file was blocking ALL robots.
I immediately e-mailed out and received a long reply about foreign robots and scrappers slowing down the website. They told me I would have to provide a list of only the good robots to allow in robots.txt.
Please correct me if I'm wrong.. but isn't Robots.txt optional?? Won't a bad scrapper or bot still bog down the site? Shouldn't that be handled in httaccess or something different?
I'm not new to SEO but I'm sure some of you who have been around longer have run into something like this and could provide some suggestions or resources I could use to plead my case!
If I'm wrong.. please help me understand how we can meet both needs of allowing bots to visit the site but prevent the 'bad' ones. Their claim is the site is bombarded by tons and tons of bots that have slowed down performance.
Thanks in advance for your help!
-
Thanks for the suggestions!! I'll keep you updated.
-
You can get the list of good robots from the list at Robotstxt.org: http://www.robotstxt.org/db.html.
I'd recommend creating an edited version of the robots.txt file yourself, specifically Allowing googlebot and others. Then send that with a link to the robotstxt.org site.
You may need to get the business owners involved. IT exists to enable the business, not strap it down so it can't move.
-
What you could do is just add Allow statements for the different Googlebots and the bots of other search engines. This will probably make the developers happy so they can keep other bots out of the door (although I doubt this would work and definitely don't think that this should be the option to keep spammers away, but that says more about the quality of development ;-)).
-
Yes, there are a ton of bad bots one may want to block. Can you show us the robots.txt file? If they aren't blocking legit search engine bots, you're probably okayish. If they are actually blocking all bots, you have cause for concern.
Can you give us a screenshot from GWT?
I use a program called Screaming Frog daily. It's not malicious, off the shelf. I just want to crawl and gather meta data. I can tell it to disregard robots.txt. It will crawl a site until it hit's something password protected. There's not much any robots.txt can do about it, as it can also spoof user agents.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Any idea why Google Search Console stopped showing "Internal Links" and "Links to your site"
Our default eCommerce property (https://www.pure-elegance.com) used to show several dozen External Links and several thousand Internal Links on Google Search Console. As of this Friday both those links are showing "No Data Available". I checked other related properties (https://pure-elegance.com, http:pure-elegance.com and http://www.pure-elegance.com) and all of them are showing the same. Our other statistics (like Search Analytics etc.) remain unchanged. Any idea what might have caused this and how to resolve this?
Intermediate & Advanced SEO | | SudipG0 -
Syntax: 'canonical' vs "canonical" (Apostrophes or Quotes) does it matter?
I have been working on a site and through all the tools (Screaming Frog & Moz Bar) I've used it recognizes the canonical, but does Google? This is the only site I've worked on that has apostrophes. rel='canonical' href='https://www.example.com'/> It's apostrophes vs quotes. Could this error in syntax be causing the canonical not to be recognized? rel="canonical"href="https://www.example.com"/>
Intermediate & Advanced SEO | | ccox10 -
Twitter Robots.TXT
Hello Moz World, So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use: Useragent: * Disallow: But, they address each search engine. Is there any benefit to this? Thanks for all of the awesome responses!!! B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Should I care about this Webmaster Tools Message
Here is the message: "Googlebot found an extremely high number of URLs on your site: http://www.uncommongoods.com/" Should i try to do anything about this? We are not having any indexation issues so we think Google is still crawling our whole site. What could be some possible repercussions of ignoring this? Thanks Mozzers! -Zack
Intermediate & Advanced SEO | | znotes0 -
Is my landing page "over-optimized"? Please help
Hello out there My website www.painterdublin.com and www.tilers-dublin.com were heavily hit by google panda update on 27.9.2012 and EMD update few days after. I lost about 70% of the traffic mainly from combination of the keywords from my domain name (painter dublin and tilers dublin) and never managed to recover from it. I am wondering if I should also concentrate on rewriting the content of both home landing pages in the terms of "KEYWORD DENSITY". Do you think my content is "OVER OPTIMIZED" for my main keywords? (painter dublin, tilers-dublin). What is the correct use? Is there any tool to guide me? I am aware I am using those terms quite often. I don't want to start deleting those terms before I know the right way to do it. Is there anybody willing to have a look at my sites and give me advice please? kind regards Jaro
Intermediate & Advanced SEO | | jarik0 -
Alexa site title shows as "302 Found" on search result pages
If you search for the site "ixl.com" in Alexa, for some reason, it's showing the site as "302 Found" instead of showing the website name, IXL. If you drill into that, it shows the site as ixl.com, but underneath that, it says "302 Found" again. Every other site I search for seems to show the site's name properly. I have no idea where it's getting this "302 Found" from. Does anyone know how to fix this? Here's a link directly to the search results page: http://www.alexa.com/search?q=ixl.com
Intermediate & Advanced SEO | | john4math0 -
I need help with htaccess redirect
Hi guys, we have the domain cheats.co.uk, it has always displayed as cheats.co.uk without the www. However it is now showing 2 version of the site, both the www. and the non www. version. I know how to add to the htaccess folder to get the non www. version going to the www. version but i am worried about doing this because the non www. version has always been the one indexed in Google and has a page rank of 3. Should i in fact be redirecting the www.version to the non www. version to keep page rank etc? or will page rank be passed over etc if i redirect to the www. version I hope thats clear Thanks guys Jon
Intermediate & Advanced SEO | | imrubbish0