Duplicate content issue: staging urls has been indexed and need to know how to remove it from the serps
-
duplicate content issue: staging url has been indexed by google ( many pages) and need to know how to remove them from the serps.
Bing sees the staging url as moved permanently
Google sees the staging urls (240 results) and redirects to the correct url Should I be concerned about duplicate content and request Google to remove the staging url removed
Thanks Guys
-
Thanks for helping Malika! To clarify for other readers, blocking in robots.txt after the pages have been indexed will actually prevent them from being removed from the index with a meta noindex tag, since Google won't be able to crawl the pages to see the noindex tag.
If staging URLs have been indexed already (and assuming they still need to exist), here's the steps I would take:
- Add meta noindex tags to every staging URLs
- If urgent, also do a URL removal request in Webmaster Tools (but this is usually not needed)
- Wait until the staging URLs are noindexed - you can check periodically by doing site: searches in Google.
- Only after they are noindexed, block Search Engines from crawling them with the robots.txt file.
-
Generally you'll want to hide your staging site from search engines and as Malika mentioned, the best way to do this is via robots.txt.
That lets you essentially set a rule stating that no crawlers are to access anything on that domain. Beyond that, nothing else is really relevant; if crawlers can't see your site, it doesn't matter what you do with it! You don't even need to worry about 301 redirects once this is done.
Once you apply that change in robots.txt, you may still see your staging site indexed for a little while (anywhere from hours to a couple of months) but this is normal and it will drop away soon enough.
Search engines are pretty good at determining which is the real site these days anyway!
-
Thanks for your suggestions Peter and Malika,
By the wayt The staging site had it's own url..
I think I need help with the canonical stuff, as I am not really sure how to use it.
-
Quick way to remove staging url is sending HTTP error 410 as result.
Other is to use in SearchConsole Remove URLs function https://www.google.com/webmasters/tools/url-removalAbout duplicate content - you must see actual canonical. If on stage URL there is canonical point to normal site then you shouldn't hesitating. But if staging and normal point to different URLs then you can see some algo filter.
-
I am assuming that these pages don't hold any authority or backlinks at all. You can simply delete these pages (if the purposes of these pages has been solved.
Or if you still need these pages live, use Robots.txt file to make these pages (or the whole subdomain/directory they are sitting as disallowed, no-index)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect indexed lightbox URLs?
Hello all, So I'm doing some technical SEO work on a client website and wanted to crowdsource some thoughts and suggestions. Without giving away the website name, here is the situation: The website has a dedicated /resources/ page. The bulk of the Resources are industry definitions, all encapsulated in colored boxes. When you click on the box, the definition opens in a lightbox with its own unique URL (Ex: /resources/?resource=augmented-reality). The information for these colored lightbox definitions is pulled from a normal resources page (Ex: /resources/augmented-reality/). Both of these URLs are indexed, leading to a lot of duplicate indexed content. How would you approach this? **Things to Consider: ** -Website is built on Wordpress with a custom theme.
Technical SEO | | Alces
-I have no idea how to even find settings for the lightbox (will be asking the client today).
-Right now my thought is to simply disallow the lightbox URL in robots.txt and hope Google will stop crawling and eventually drop from the index.
-I've considered adding the main resource page canonical to the lightbox URL, but it appears to be dynamically created and thus there is no place to access (outside of the FTP, I imagine?). I'm most rusty with stuff like this, so figured I'd appeal to the masses for some assistance. Thanks! -Brad0 -
Do mobile and desktop sites that pull content from the same source count as duplicate content?
We are about to launch a mobile site that pulls content from the same CMS, including metadata. They both have different top-level domains, however (www.abcd.com and www.m.abcd.com). How will this affect us in terms of search engine ranking?
Technical SEO | | ovenbird0 -
Hreflang and possible duplicate content SEO issue
| 0 <a class="vote-down-off" title="This question does not show any research effort; it is unclear or not useful">down vote</a> favorite | Hey community, my first question here 🙂 Imagine there is a page with video, it has hreflang tags setup, to lead let's say German visitors to /de/ folder... So, on that German version of page, everything like menus, navigation and such are in German, but the video is the same, the title of the video (H1 tag) is the same, <title></code></strong> and <strong><code>meta description</code></strong> is the same as on the original English page. It means that general (English) page and German version of it has the same key content in English.</p> <p>To me it seems to be a SEO duplicate content issue. As I know, Google doesn't think that content is duplicate, if it is properly translated to other language.</p> <p>Does my explained case mean that the content will be detected by Google as duplicate?</p> </div> </div> </td> </tr> </tbody> </table></title> |
Technical SEO | | poiseo0 -
Duplicate Content within Site
I'm very new here... been reading a lot about Panda and duplicate content. I have a main website and a mobile site (same domain - m.domain.com). I've copied the same text over to those other web pages. Is that okay? Or is that considered duplicate content?
Technical SEO | | CalicoKitty20000 -
How do I Address Low Quality/Duplicate Content Issue for a Job portal?
Hi, I want to optimize my job portal for maximum search traffic. Problems Duplicate content- The portal takes jobs from other portals/blogs and posts on our site. Sometimes employers provide the same job posting to multiple portals and we are not allowed to change it resulting in duplicate content Empty Content Pages- We have a lot of pages which can be reached via filtering for multiple options. Like IT jobs in New York. If there are no IT jobs posted in New York, then it's a blank page with little or no content Repeated Content- When we have job postings, we have about the company information on each job listing page. If a company has 1000 jobs listed with us, that means 1000 pages have the exact same about the company wording Solutions Implemented Rel=prev and next. We have implemented this for pagination. We also have self referencing canonical tags on each page. Even if they are filtered with additional parameters, our system strips of the parameters and shows the correct URL all the time for both rel=prev and next as well as self canonical tags For duplicate content- Due to the volume of the job listings that come each day, it's impossible to create unique content for each. We try to make the initial paragraph (at least 130 characters) unique. However, we use a template system for each jobs. So a similar pattern can be detected after even 10 or 15 jobs. Sometimes we also take the wordy job descriptions and convert them into bullet points. If bullet points already available, we take only a few bullet points and try to re-shuffle them at times Can anyone provide me additional pointers to improve my site in terms of on-page SEO/technical SEO? Any help would be much appreciated. We are also thinking of no-indexing or deleting old jobs once they cross X number of days. Do you think this would be a smart strategy? Should I No-index empty listing pages as well? Thank you.
Technical SEO | | jombay3 -
Getting images indexed in the SERPS
Good Afternoon form 13 degrees C totally Sunny Wetherby UK 🙂 Am i right in thinking that the only way to get images appearing like this in your serps: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/innovia-merchant-immages-serpscopy.jpg is to be hooked up to Google Merchant? Which kind of means if the sight your working on has no images then this type of enhancement is out of bounds? Thanks in advance, David
Technical SEO | | Nightwing0 -
Duplicate content error - same URL
Hi, One of my sites is reporting a duplicate content and page title error. But it is the same page? And the home page at that. The only difference in the error report is a trailing slash. www.{mysite}.co.uk www.{mysite}.co.uk/ Is this an easy htaccess fix? Many thanks TT
Technical SEO | | TheTub1 -
Duplicate Content Issues - Should I build a new site?
I'm currently working on a site which is built using Zen Cart. The client also has another version which has the same products on it. The product descriptions and the vast majority of the text has been re-written. I've used the duplicate content tool and these are the results: HTML fingerprint: 0000a7ee1f07a131 0000a7ec1f07a931 92.31% Total HTML similarity: 76.33% Standard text similarity: 66.72% Smart text similarity: 45.81% Total text similarity 56.27% I considered using a different eCommerce system like Magento or Volusion. So I had a look at a few templates, chose one and then used the tool again and got the following: HTML fingerprint: 0000a7e41b012111 0000a7ec1f07a931 72.00% Total HTML similarity: 64.65% Standard text similarity: 11.69% Smart text similarity: 17.90% Total text similarity 14.80% Do you think its worth doing this? thanks Dan
Technical SEO | | TheYeti0