How to stop robots.txt restricting access to sitemap?
-
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself?
The robots.txt page shows
User-agent: * Disallow: / And then sitemap: with the correct sitemap link
-
Hi there
Right now, you're telling crawlers to not crawl your entire site, so the sitemap XML would be included in that. Are you wanting your site to be crawled completely? Simply change the robots.txt to this...
User-agent: *
Allow: /Here is another great resource from SEOBook to check out!
Hope this helps! Good luck!
Patrick
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Image sitemap
I work on a big eCommerce site with thousands of pages. We are talking about crating a separate image sitemap. Any idea example of an eCommerce site who has a separate image sitemap? I looked several and cant find one. Also, what are the best practices for creating a good image sitemap? thanks!
Technical SEO | | bizuH0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
Should 301-ed links be removed from sitemap?
In an effort to do some housekeeping on our site we are wanting to change the URL format for a couple thousand links on our site. Those links will all been 301 redirected to corresponding links in the new URL format. For example, old URL format: /tag/flowers as well as search/flowerswill be 301-ed to, new URL format: /content/flowers**Question:**Since the old links also exist in our sitemap, should we add the new links to our sitemap in addition to the old links, or replace the old links with new ones in our sitemap? Just want to make sure we don’t lose the ranking we currently have for the old links.Any help would be appreciated. Thanks!
Technical SEO | | shawn811 -
Google webmaster and analytics access to others
Hi, Google webmaster and analytics access to others with restricted and user access is this ok and secure? With this access can anyone tamper anything? thanks
Technical SEO | | mtthompsons0 -
%20 URL accessible, does this matter?
I have a rewrite on the CMS I work on. What happens here is that if someone creates a page on the website and uses spaces as the name then the CMS automatically replaces the spaces with -'s. I noticed this morning that the %20 URLs are accessible but not indexed at all. Only the - URLs are indexed. could this cause duplicate content or penalties? I know best practice is to have only ONE URL for a page but somehow the developer can't redirect the %20 URLs to the - URLs. Opinions?
Technical SEO | | DROIDSTERS0 -
How can I make Google Webmaster Tools see the robots.txt file when I am doing a .htacces redirec?
We are moving a site to a new domain. I have setup an .htaccess file and it is working fine. My problem is that Google Webmaster tools now says it cannot access the robots.txt file on the old site. How can I make it still see the robots.txt file when the .htaccess is doing a full site redirect? .htaccess currently has: Options +FollowSymLinks -MultiViews
Technical SEO | | RalphinAZ
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?michaelswilderhr.com$ [NC]
RewriteRule ^ http://www.s2esolutions.com/ [R=301,L] Google webmaster tools is reporting: Over the last 24 hours, Googlebot encountered 1 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%.0 -
Meta-robots Nofollow on logins and admins
In my SEO MOZ reports I am getting over 400 errors as Meta-robots Nofollow. These are all leading to my admin login page which I do not want robots in. Should I put some code on these pages so the robots know this and don't attempt to and I do not get these errors in my reports?
Technical SEO | | Endora0 -
Xml Sitemap
Hi mozzers, I am about to submit a sitemap for one of my clients via webmaster tools. The issue is that I have way too many urls that I don't want them to be indexed by Google such as testing pages, auto generated pages... Is there way to remove certain URL from the XML sitemap or is this impossible? If impossible, is the only way to control these urls is to "No index" all these pages that i don't want the search engine to see? Thanks Mozzers,
Technical SEO | | Ideas-Money-Art0