Allow only Rogerbot, not googlebot nor undesired access
-
I'm in the middle of site development and wanted to start crawling my site with Rogerbot, but avoid googlebot or similar to crawl it.
Actually mi site is protected with login (basic Joomla offline site, user and password required) so I thought that a good solution would be to remove that limitation and use .htaccess to protect with password for all users, except Rogerbot.
Reading here and there, it seems that practice is not very recommended as it could lead to security holes - any other user could see allowed agents and emulate them. Ok, maybe it's necessary to be a hacker/cracker to get that info - or experienced developer - but was not able to get a clear information how to proceed in a secure way.
The other solution was to continue using Joomla's access limitation for all, again, except Rogerbot. Still not sure how possible would that be.
Mostly, my question is, how do you work on your site before wanting to be indexed from Google or similar, independently if you use or not some CMS? Is there some other way to perform it?
I would love to have my site ready and crawled before launching it and avoid fixing issues afterwards...Thanks in advance.
-
Great, thanks.
With those 2 recommendations I have more than enough for the next crawler. Thank you both!
-
Hi, thanks for answering
Well, it looks doable. Will try t do it on next programmed crawler, trying to minimize exposed time.
Hw, your idea seems very compatible with my first approach, maybe I could also allow rogerbot through htaccess, limiting others and only for that day remove the security user/password restriction (from joomla) and leave only the htaccess limitation. (I know maybe I'm a bit paranoid just want to be sure to minimize any collateral effect...)
*Maybe could be a good feature for Moz to be able to access restricted sites...
-
Hi,
I ran into a similar issue while we were redesigning our site. This is what we did. We unblocked our site (we also had a user and password to avoid Google indexing it). We added the link to a Moz campaign. We were very careful not to share the URL (developing site) or put it anywhere where Google might find it quickly. Remember Google finds links from following other links. We did not submit the developing site to Google webmaster tools or Google analytics. We watched and waited for the Moz report to come in. When it did, we blocked the site again.
Hope this helps
Carla
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our crawler was not able to access the robots.txt file on your site.
Good morning, Yesterday, Moz gave me an error that is wasn't able to find our robots.txt file. However, this is a new occurrence, we've used Moz and its crawling ability many times prior; not sure why the error is happening now. I validated that the redirects and our robots page are operational and nothing is disallowing Roger in our robots.txt. Any advice or guidance would be much appreciated. https://www.agrisupply.com/robots.txt Thank you for your time. -Danny
Moz Pro | | Danny_Gallagher0 -
Increase in authorization permission errors (Access Denied - Error 403)
Hi MOZ community, Since last week when I changed my theme in a WP installation I noticed (in WMT and MOZ tool) that I have increased number in authorization permission errors (error 403-forbidden). What happens is that I received a 403 error for almost every single URL of my site. All these URLs are not "real" ones but they all have my email in the end. i.e. I get an 403 error for the "/contact/[email protected]" whilst the real URL is just "/contact/" This happens, as I said, for almost every single page of my site. I have no other crawling or indexation issues, all URLs are correctly indexed. All new pages are correctly indexed as well. URIs ending with "[email protected]" are not indexed off course. WP and all installed plugins & theme are on the latest available release. For SEO purposes I use Yoast SEO WP plugin. The site in questions is: fantasylogic.com Any suggestions would be highly appreciated. Thank you in advance
Moz Pro | | gpapatheodorou0 -
Rogerbot's crawl behaviour vs google spiders and other crawlers - disparate results have me confused.
I'm curious as to how accurately rogerbot replicates google's searchbot I've currently got a site which is reporting over 200 pages of duplicate/titles content in moz tools. The pages in question are all session IDs and have been blocked in the robot.txt (about 3 weeks ago), however the errors are still appearing. I've also crawled the page using screaming frog SEO spider. According to Screaming Frog, the offending pages have been blocked and are not being crawled. Webmaster tools is also reporting no crawl errors. Is there something I'm missing here? Why would I receive such different results. Which one's should I trust? Does rogerbot ignore robot.txt? Any suggestions would be appreciated.
Moz Pro | | KJDMedia0 -
Why am I not getting my allowance of 10,000 inbound links in csv download file? 370 out of 4700??
Hi, I'm desparately trying to audit my backlinks to remove a penguin penalty on my site livefit.co.uk When I do the inbound link report i'm not getting all the links in the download. I know there is a limit of 25 links from each linking site so we get the full picture of links bu: I have 4700 links so why does it need to limit it when we are supposed to see up to 10,000? When you check the link profile on the report it doesn't seem there are many sites with anything close to 25, so surely that rule is invalid as an explanation here? Should I just work off OSE? But there is less useful info than on the csv.. I'd be very grateful for your thoughts. Thanks! James
Moz Pro | | LiveFit0 -
In alt tag of a image can we use #hashtag or domain.com ? Is that good SEO or not allowed ?
Some of the Google Search shows a title has a hashtag of an article, which contain keyword and while tweeting them, the title which has a hashtag automatically very good used for getting traffic to the blog. And other one, can we use the hash tag inside the alt attribute ? Or our domain name with .com in it. Like Google.com or #Google ?
Moz Pro | | Esaky0 -
What do I do about accessing open site explorer?
I am stuck in the redirect loop which seems to have become commonplace when seeking to access open link explorer... I have cleared my cache. I have reset and restarted my browser. And I think I have done what I need to do with cookies. Someone please help me - I am using 10.6.8 OSX - Safari - but have also tried chrome thanks in advance be blessed bd
Moz Pro | | creativeguy0 -
Pro member lost access to opensiteexplorer
Hi there Im a pro member but cannot access to opensiteexplorer. When using it, it send me to http://www.seomoz.org/ose/gopro Could anyone please fix it?
Moz Pro | | fleetway0 -
Blocking all robots except rogerbot
I'm in the process of working with a site under development and wish to run the SEOmoz crawl test before we launch it publicly. Unfortunately rogerbot is reluctant to crawl the site. I've set my robots.txt to disallow all bots besides rogerbot. Currently looks like this: User-agent: * Disallow: / User-agent: rogerbot Disallow: All pages within the site are meta tagged index,follow. Crawl report says: Search Engine blocked by robots.txt Yes Am I missing something here?
Moz Pro | | ignician0