Sitemap Contains Blocked Resources
-
Hey Mozzers,
I have several pages on my website that are for user search purposes only. They sort some products by range and answer some direct search queries users type into the site. They are basically just product collections that are else ware grouped in different ways.
As such I didn't wants SERPS getting their hands on them so blocked them in robots so I could add then worry free. However, they automatically get pulled into the sitemap by Magento.
This has made Webmaster tools give me a warning that 21 urls in the sitemaps are blocked by robots.
Is this terrible SEO wise?
Should I have opted to NOINDEX these URLS instead? I was concerned about thin content so really didnt want google crawling them.
-
Thanks for the latest responses guys
I have researched it into the grave and it the way Magento generates the sitemap makes it impossible for me to exclude these URLS.
I will just unblock them from robots, and make them all noindex. This seems to solve all problems, i will then block them when im 100% sure they are unindexed.
Thanks Again chaps.
Big help as always.
-
OK so first because some are indexed, if you block access, they will never be removed.
What you will need to do is add a noindex tag to the pages but don't block access to them so that Google can honour the noindex. Remove the pages via Search Console and once you have confirmed these are all removed from the index, you will be good to then block access via robots.txt.
As CleverPhD said, ideally you don't want pages in the index that can't be crawled, but it isn't likely to cause a penalty of any sort (I have a client with about 70-80 blocked - long story - no issues in 12 months) if you are stuck because of Megento - Perhaps research to see how others have got around this?
-Andy
-
I would recommend that you try and get those pages out of your sitemap. If you look through the Google sitemap best practices, it states that the sitemap should be for pages that Googlebot can access.
http://googlewebmastercentral.blogspot.com/2014/10/best-practices-for-xml-sitemaps-rssatom.html
URLs
URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:
- Only include URLs that can be fetched by Googlebot. **A common mistake is **including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don't exist.
-
Hi Andy,
I just checked and yes they were previously index'd and some of them still are.
-
Hi,
Is this terrible SEO wise?
Not really - it just means that Google can see that there is a page they can't access so are informing you of this. There is no negative penalty that is going to come from this. If there were old pages that are now 404's then it would be a different story.
I just want to be sure of something - were the pages previously open to Google? Are they currently indexed?
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Linking to a Resource from a multi-language Page
I have a multi-language page where the content is available in several versions (translated). I want to link to a resource that is only available in one English. Is it a good idea to link to this resource from all language versions or should I better include the link only in the English version of my page? In the first scenario for example a Spanisch and a German language version would link to a page in English. Is this ok or could it be considered spam?
Technical SEO | | ConverterApp0 -
Do I have to create a separate sitemap for my multilingual site?
Hi, I was wondering how should I implement a sitemap for my multilingual site. Currently we have two languanges separated by subdirectories in our site /en (english) and /fr (french) however based on the the articles that I have read there are no clear explanation on the implementation of the sitemap with different languanges. Here are the cases I think is possible for the implementation: Case 1: One sitemap with all the en and fr pages together with hreflang attribution for each pages Case 2: One sitemap with only en pages with hreflang attribution for both languages (en and fr) Case 3: Separate sitemap for en and fr pages with hreflang attribution for both languanges and connect both through sitemapindex creation. If any of my proposed cases are not possible please let me know the best approach in creating a multilingual sitemap for my site. Appreciate your thoughts regarding this. Thank you!
Technical SEO | | ReneAnton0 -
Will it be possible to point diff sitemap to same robots.txt file.
Will it be possible to point diff sitemap to same robots.txt file.
Technical SEO | | nlogix
Please advice.0 -
Submitting a new sitemap index file. Only one file is getting read. What is the error?
Hi community, I am working to submit a new a new sitemap index files, where about 5 50,000 sku files will be uploaded. Webmasters is reporting that only 50k skus have been submitted. Google Webmasters is accepting the index, however only the first file is getting read. I have 2 errors and need to know if this is the reason that the multiple files are not getting uploaded. Errors: | 1 | | Warnings | Invalid XML: too many tags | Too many tags describing this tag. Please fix it and resubmi | | 2 | | Warnings | Incorrect namespace | Your Sitemap or Sitemap index file doesn't properly declare the namespace. | 1 | Here is the url I am submitting: http://www.westmarine.com/sitemap/wm-sitemap-index.xml | 1 | | | | |
Technical SEO | | mm9161570 -
Meta data & xml sitemaps for mobile sites when using rel="canonical"/rel="alternate" annotations
When using rel="canonical" and rel="alternate" annotations between mobile and desktop sites (rel="canonical" on mobile, pointing to desktop, and rel="alternate" on desktop pointing to mobile), what are everyone's thoughts on using meta data on the mobile site? Is it necessary? And also, what is the common consensus on using a separate mobile xml sitemap?
Technical SEO | | 4Ps0 -
Has any positive or negative effect for the SEO results if the domain contains desired keyword?
Helo! Has any positive or negative effect for the SEO results if the domain contains desired keyword? Thanks for the answer.
Technical SEO | | Brainsum0 -
How can I prevent sh404SEF Anti-flood control from blocking SEOMoz?
I'm using sh404SEF on my Joomla 1.5 website. Last week, I activated the security functions of the tool, which includes an anti-flood control feature. This morning when I looked at my new crawl statistics in SEOMoz, I noticed a significant drop in the number of webpages crawled, and I'm attributing that to the security configurations that I made earlier in the week. I'm looking for a way to prevent this from happening so the next crawl is accurate. I was thinking of using sh404SEFs "UserAgent white list" feature. Does SEOMoz have a UserAgent string that I could try adding to my white list? Is this what you guys recommend as a solution to this problem?
Technical SEO | | JBradySD0