Restricted by robots.txt and soft bounce issues (related).
-
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this.
Any help?
Thanks, Libby
-
**These are duplicate URLs that we can't figure out how they are getting created. **
I want to be sure we are talking about the same thing here. When I hear "duplicate URL" I am thinking of multiple URLs which point to the same web page. Depending on how your site is set up it is possible to have many different URLs point to the same web page. Possible examples are:
www.mydomain.com/tennis-rackets
www.mydomain.com/tennis-rackets/
mydomain.com/tennis-rackets?sort=asc
Above are three examples of URLs which can all lead to the same page. You can have dozens of URLs all lead to a page with identical content. How these issues get resolved depends upon how they were created.
The best tool to help you figure this out is your crawl report. Use the SEOmoz crawl tool, then examine the crawl report. It can be a bit overwhelming at first, but you can narrow things down real fast if you use Excel.
Select the header row for your data (begins with the URL field), then select Data > Filter > Auto Filter from the menu. Then start by looking at fields such as "Duplicate Page Content", "URLs with duplicate content", etc. Simply choose YES in the drop down menu to filter for that particular data. This will help you uncover the source of these issues.
The URLs in my example should all be 301'd or canonicalized to the primary page to resolve the duplication issue.
-
Well, part of the problem is these are duplicate URLs that we can't figure out how they are getting created. They were supposed to resolve to our 404 page... Should we remove them all?
-
Hi Libby.
How do you intend to resolve these URLs? Ideally you would remove your robots.txt entries and restrict the pages with meta tags such as "noindex follow" or whatever is appropriate. Any links to 404 pages should be updated or removed.
What further direction do you seek?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
X-robots tag causing no index issues
I have an interesting problem with a site which has an x-robot tag blocking the site from being indexed, the site is in Wordpress, there are no issues with the robots.txt or at the page level, I cant find the noindex anywhere. I removed the SEO plug-in which was there and installed Yoast but it made no difference. this is the url: https://www.cotswoldflatroofing.com/ Its coming up with a HTTP error: x-robots tag noindex, nofollow, noarchive
Technical SEO | | Donsimong0 -
Recovering from Sitemap Issues with Bing
Hi all, I recently took over SEO efforts for a large e-commerce site (I would prefer not to disclose). About a month ago, I began to notice a significant drop in traffic from Bing and uncovered in Bing Webmaster Tools that three different versions of the sitemap were submitted and Bing was crawling all three. I removed the two out of date sitemaps and re-submitted the up to date version. Since then, I have yet to see Bing traffic rebound and the amount of pages indexed by Bing is still dropping daily. During this time there has been no issue with traffic from Google. Currently I have 1.3 million pages indexed by Google while Bing has dropped to 715K (it was at 755K last week and was on par with Google several months ago). I know that no major changes have been made to the site in the past year so I can't point to anything other than the sitemap issue to explain this. If this is indeed the only issue, how long should I expect to wait for Bing to re-index the pages? In the interim I have been manually submitting important pages that aren't currently in the index. Any insights or suggestions would be very much appreciated!
Technical SEO | | tdawson090 -
Robots.txt
Hello, My client has a robots.txt file which says this: User-agent: * Crawl-delay: 2 I put it through a robots checker which said that it must have a **disallow command**. So should it say this: User-agent: * Disallow: crawl-delay: 2 What effect (if any) would not having a disallow command make? Thanks
Technical SEO | | AL123al0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
CSS Issue or not?
Hi Mozzers, I am doing an audit for one of my clients and would like to know if actually the website I am dealing with has any issues when disabling CSS. So I installed Web developer google chrome extension which is great for disabling cookies, css... When executing "Disable CSS", I can see most of the content of the page but what is weird is that in certain sections images appear in the middle of a sentence. Another image appears to be in the background in one of the internal link section(attached pic) Since I am not an expert in CSS I am wondering if this represents a CSS issue, therefore a potential SEO issue? If yes why it can be an SEO issue? Can you guys tell me what sort of CSS issues I should expect when disabling it? what should I look at? if content and nav bar are present or something else? Thank you dBCvk.png
Technical SEO | | Ideas-Money-Art0 -
Drupal issue
Hi seomozzers again, One of my clients uses DRUPAL(cms) and I have an issue when editing any pages. I access the edit section of the page, try to insert meta description tags, save and view the page source and NO meta description tags appear!! Why is that? Is there a specific setting that I need to implement? Under Meta Tags, apparently DRUPAL likes to put canonical tags by default(unless i can tweek one of the settings), and I would like to remove them. The weird thing is that even there are no canonical tags set, when viewing the page source I can still locate a canonical tag. Is there a setting that allow me to remove the canonical by default? Thank you mozzers:)
Technical SEO | | Ideas-Money-Art0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0