Robots.txt anomaly

Dan-Lawrence

Hi,

I'm monitoring a site thats had a new design relaunch and new robots.txt added.

Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14).

In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example:

Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes

etc, etc

And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ?

Thanks in advance for any help/advice/clarity why this may be happening ?

Cheers

Dan

Dan-Lawrence

many thanks for that Dan !

evolvingSEO

As far as I thought, the important thing is that your feed shows up in feed readers. Can you subscribe to and view your RSS feed in a variety of different feed readers?

Yes, so long as the ? is utilized only in ways in which would result in duplicate content, or content that would not be desirable to crawl, it will have that effect.

-Dan

Dan-Lawrence

Many Thanks for your comments Dan !

So it doesnt matter that the feeds not going to be crawled, dont we want feeds to be crawled usually?

Blocking anything with a ? is surely good then isnt it since prevents all the dupe content etc one gets from search results ?

Yes my clients webmaster set it up

evolvingSEO

Hi Dan

I see no reason to disallow the feed like that by default, unless there is some reason I don't know about. But it won't harm anything either.

The second part blocks any URL which begins with a ? (question mark). This would block anything that has a parameter in the URL - most commonly a search word, pagination, filtering settings etc.

As far as I'm aware this is not going to be damaging to the site, but it's not the default setting. Did someone set it up that way for you?

My robots.txt shows the default WordPress settings: http://www.evolvingseo.com/robots.txt

Dan-Lawrence

Hi Dan

Yes please find below, please can you also confirm if the bottom 2 lines refer to blocking internal search results ?:

Disallow: /feed
Disallow: */feed

Disallow: /?
Disallow: /*?

Many Thanks

Dan

evolvingSEO

Hi Dan

Can you share the exact line disallowing RSS?

Thanks!

-Dan

Dan-Lawrence

sorry 1 more question, i see that the webmaster has disallowed the feeds in the robots.txt file is this normal/desirable, i would have thought one would want rss feeds crawled by Google ?

Dan-Lawrence

nice 1 cheers Jesse !

jesse-landry

Your assumption is correct. The disallows you listed are directories, not pages. Therefore, anything within the Plugins folder will be disallowed, same with the cache and themes folder.

So you may have multiple files (and I'm sure you do) within each of those folders.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt anomaly

Browse Questions

Explore more categories

Related Questions

Robots.txt allows wp-admin/admin-ajax.php

Huge number of crawl anomalies and 404s - non- existent urls

Should you use robots.txt for pages within your site which do not have high quality content or are not contributing a great deal so when Google crawls your site the best performing content has a higher chance of being indexed?

How to solve the meta : A description for this result is not available because this site's robots.txt. ?

RegEx help needed for robots.txt potential conflict

Question about Robot.txt

How ro write a robots txt file to point to your site map

Robots.txt for subdomain