How to allow bots to crawl all but WP-content
-
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
Thank you for the help, Gaston!
-
Yeap, with that you are allowing every file ending with that extension
-
Can I do so with:
Allow: *.jpg
Allow: *.png
-
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself.
Thanks for pointing it out! -
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
Hi Tom,
Yes, this config will allow images to be crawled,
No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png
How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.
Hope it helps.
Best luck.
GR -
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
Awesome. Thanks, Gaston!
-
Yes it does.
As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.Hope it helps.
Best luck
GR -
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
Hi Tom!
That Robots.txt config is pretty redundant.
To acheive what you what, thy this:User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.cssJust 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.
Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat pageHope it helps.
Best luck.
GR
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WP DomainName.de in permalink
Hey. On my WordPress page I did reviews about other websites. My page is in German language. My keywords and title are: "DomainName.de Erfahrungen". Wordpress changes the dot "." to an dash "-" and I cannot change it. Yoast SEO told me, my keywords are not (complete) in my permalink. Google use an "-" to separate to words (https://mza.bundledseo.com/community/q/how-is-a-dash-or-handled-by-google-search). Wordpress changes a "." to a "-": domain-de Invasion forum just removes the ".": domainde Trustpilot has the full domainame in permalink: domain.de So what's is the best strategy for the best SEO results?
Technical SEO | | cwltd0 -
Does duplicate content not concern Rand?
Hello all, I'm a new SEOer and I'm currently trying to navigate the layman's minefield that is trying to understand duplicate content issues in as best I can. I'm working on a website at the moment where there's a duplicate content issue with blog archives/categories/tags etc. I was planning to beat this by implementing a noindex meta tag on those pages where there are duplicate content issues. Before I go ahead with this I thought: "Hey, these Moz guys seem to know what they're doing! What would Rand do?" Blogs on the website in question appear in full and in date order relating to the tag/category/what-have-you creating the duplicate content problem. Much like Rand's blog here at Moz - I thought I'd have a look at the source code to see how it was dealt with. My amateur eyes could find nothing to help answer this question: E.g. Both the following URLs appear in SERPs (using site:moz,com and very targeted keywords, but they're there): https://mza.bundledseo.com/rand/does-making-a-website-mobile-friendly-have-a-universally-positive-impact-on-mobile-traffic/ https://mza.bundledseo.com/rand/category/moz/ Both pages have a rel="canonical" pointing to themselves. I can understand why he wouldn't be fussed about the category not ranking, but the blog? Is this not having a negative effect? I'm just a little confused as there are so many conflicting "best practice" tips out there - and now after digging around in the source code on Rand's blog I'm more confused than ever! Any help much appreciated, Thanks
Technical SEO | | sbridle1 -
Do mobile and desktop sites that pull content from the same source count as duplicate content?
We are about to launch a mobile site that pulls content from the same CMS, including metadata. They both have different top-level domains, however (www.abcd.com and www.m.abcd.com). How will this affect us in terms of search engine ranking?
Technical SEO | | ovenbird0 -
Content too buried in source code?
Our team is working on a refresh/redesign and am wondering if there's a quantifiable way of determining how high our meta data, H1 and paragraph should be in the source code. Or even whether I should be concerned with that. Our navigation will likely have dozens of links (we're going to keep it to under 100), and this doesn't even factor in the design elements. I am concerned about the content being buried. Are these the kind of concerns I should be having? Is there a measurable way to avoid it?
Technical SEO | | SSFCU0 -
External video content in iframe
Hi, On our site we have a lot of video content. The player is hosted by a third party so we are using an iframe to include the content on our site. The problem is that the content it self (on the third party domain) is shown in the google result. My question is: Can we ask the third party to disallow the content from indexing in their robots.txt or will that also affect our own use of the video content? For example we use video-sitemaps to include the videos in Google video search (the sitemap links to the videos on our own domain, but we are still using iframes on the pages to collect the content from the third party domain that will then be blocked by robots.txt). I hope you understand what I mean... Any suggestions? Thanks a lot!
Technical SEO | | Googleankan0 -
Term for how content or data is structured
There is a term for how data or content is structured and for the life of me I can't figure it out. The following is the best I know of how to explain it: magnolia is of Seattle. Seattle is of Washington. Washington is of the US. US is of North America. North America is of Earth. etc etc etc etc. Any help is much appreciated. I'm trying to use the term to communicate It's application to SEO in that Google analyze how information is structured to understand the breadth and depth of your sites content...
Technical SEO | | BonsaiMediaGroup0 -
Duplicate content due to csref
Hi, When i go trough my page, i can see that alot of my csref codes result in duplicate content, when SeoMoz run their analysis of my pages. Off course i get important knowledge through my csref codes, but im quite uncertain of how much it effects my SEO-results. Does anyone have any insights in this? Should i be more cautios to use csref-codes or dosent it create problems that are big enough for me to worry about them.
Technical SEO | | Petersen110 -
seo moz crawl diagnosis
Hi seomozzers, We are creating a brand new website for a client and I would like to run an seo moz crawl to fix what has been done wrong. So my question is it ok to run an SEO moz crawl with a dev URL? Are final URLs and dev URLs will give me the same results or not? Basically, Should I wait for getting the final URL or is it ok to run a crawl under a dev URL such as www.dev2.example.com or http://183.2564.2864? Thank you 🙂
Technical SEO | | Ideas-Money-Art0