How to allow bots to crawl all but WP-content
-
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
Thank you for the help, Gaston!
-
Yeap, with that you are allowing every file ending with that extension
-
Can I do so with:
Allow: *.jpg
Allow: *.png
-
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself.
Thanks for pointing it out! -
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
Hi Tom,
Yes, this config will allow images to be crawled,
No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png
How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.
Hope it helps.
Best luck.
GR -
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
Awesome. Thanks, Gaston!
-
Yes it does.
As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.Hope it helps.
Best luck
GR -
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
Hi Tom!
That Robots.txt config is pretty redundant.
To acheive what you what, thy this:User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.cssJust 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.
Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat pageHope it helps.
Best luck.
GR
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Not ranking - Scarped content
Hi, I have a problem with a website, that never compe up with before. The website is: https://www.enallaktikidrasi.com It has a bunch of excellent articles, good enough on-page SEO and a medium backlink profile. However, it is ranking just for very very few keywords. The major problem is that there are original articles that searched by their title won't appear in top100 results but they will appear in other websites that scapre them (even if they give a backlink to our original article!) Also, the website has good rankings in Bing and Yahoo but not in Google. There are keywords ranking in #1 in Bing but nowhere in top10 pages in Google.... I am guessing for 3 issues: 1. Majestic shows a very low trust score (just 13). However, the website has not got any kind of penalty in the last 3 years. 2. There are many scarpers. The odd is that scarpers with no real value outrank our content. (Scarpers with almost zero backlink profile) 3. We ran Sucuri on website as there were a large bots attack. Is there a correlation between it bots attack and Google results? (but why not in Bing and Yahoo too?) It seems like Google underestimates the website when indexing websites for some reason. Moreover, some of the articles are really the best around but the keywords they are targeted are not either within the 30 first pages... Any help?? Thanks..
Technical SEO | | alex33andros0 -
Handling of Duplicate Content
I just recently signed and joined the moz.com system. During the initial report for our web site it shows we have lots of duplicate content. The web site is real estate based and we are loading IDX listings from other brokerages into our site. If though these listings look alike, they are not. Each has their own photos, description and addresses. So why are they appear as duplicates – I would assume that they are all too closely related. Lots for Sale primarily – and it looks like lazy agents have 4 or 5 lots and input the description the same. Unfortunately for us, part of the IDX agreement is that you cannot pick and choose which listings to load and you cannot change the content. You are either all in or you cannot use the system. How should one manage duplicate content like this? Or should we ignore it? Out of 1500+ listings on our web site it shows 40 of them are duplicates.
Technical SEO | | TIM_DOTCOM0 -
Dulpicate Content being reported
Hi I have a new client whose first MA crawl report is showing lots of duplicate content. The main batch of these are all the HP url with an 'attachment' part at the end such as: www.domain.com/?attachment_id=4176 As far as i can tell its some sort of slide show just showing a different image in the main frame of each page, with no other content. Each one does have a unique meta title & H1 though. Whats the best thing to do here ? Not a problem and leave as is Use the paremeter handling tool in GWT Canonicalise, referencing the HP or other solution ? Many Thanks Dan
Technical SEO | | Dan-Lawrence0 -
Absurdly High Crawl Stats
Over the past month and a half, our crawl stats have been rising violently. A few weeks ago, our crawl stats rose, such that the pages crawled per day worked out to the entire site being crawled 6 times a day, with a corresponding rise in KB downloaded per day. Last week, the crawl rate jumped again, such that the site is being crawled roughly 30x a day. I'm not seeing any chatter at there about an algorithm change, and I've checked and double-checked the site for signs of duplicate content, changes in our backlink profile, or anything else. We haven't seen appreciable changes in our search volume, either impressions or clicks. Any ideas what could be going on?
Technical SEO | | Tyler-Brown0 -
Cloud Hosting and Duplicate content
Hi I have an ecommerce client who has all their images cloud hosted (amazon CDN) to speed up site. Somehow it seems maybe because the pinned the images on pinterest but the CDN got indexed and there now seems to be about 50% of the site duplicated (about 2500 pages eg: http://d2rf6flfy1l.cloudfront.net..) Is this a problem with duplicate content? How come Moz doesnt show it up as crawl errors? Why is thisnot a problem that loads of people have?I only found a couple of mentions of such a prob when I googled it.. any suggestion will be grateful!
Technical SEO | | henya0 -
Joomla: content accesible through all kinds of other links >> duplicate content?!
When i did a site: search on Google i've noticed all kind of URL's on my site were indexed, while i didn't add them to the Joomla navigation (or they were not linked anywhere on the site). Some examples: www.domain.com/1-articlename >> that way ALL articles are publicly visible, even if they are not linked to a menu-item... If by accident such a link get's shared it will be indexed in google, you can have 2 links with same content... www.domain.com/2-uncategorised >> same with categories, automatically these overview pages are visible to people who know this URL. On it you see all the articles that belong to that category. www.domain.com/component/content >> this gives an overview of all the categories inside your Joomla CMS I think most will agree this is not good for your site's SEO? But how can this be solved? Is this some kind of setting within Joomla? Anyone who dealt with these problems already?
Technical SEO | | conversal0 -
External video content in iframe
Hi, On our site we have a lot of video content. The player is hosted by a third party so we are using an iframe to include the content on our site. The problem is that the content it self (on the third party domain) is shown in the google result. My question is: Can we ask the third party to disallow the content from indexing in their robots.txt or will that also affect our own use of the video content? For example we use video-sitemaps to include the videos in Google video search (the sitemap links to the videos on our own domain, but we are still using iframes on the pages to collect the content from the third party domain that will then be blocked by robots.txt). I hope you understand what I mean... Any suggestions? Thanks a lot!
Technical SEO | | Googleankan0 -
Is content important on home page
hi. i am working on a site at the moment www.in2town.co.uk and i am trying to decide if on the second column of my site where it says uk news, if i should keep it the way it is and have content under the picture or should i get rid of the content under the picture and just have the main title. I am wanting to know if the content under the picture is important for google and for the reader or would it be better just to have the title which is h2. any help would be great.
Technical SEO | | ClaireH-1848860