Prevent indexing of dynamic content
-
Hi folks!
I discovered bit of an issue with a client's site. Primarily, the site consists of static html pages, however, within one page (a car photo gallery), a line of php coding:
dynamically generates a 100 or so pages comprising the photo gallery - all with the same page title and meta description. The photo gallery script resides in the /gallery folder, which I attempted to block via robots.txt - to no avail. My next step will be to include a:
within the head section of the html page, but I am wondering if this will stop the bots dead in their tracks or will they still be able to pick-up on the pages generated by the call to the php script residing a bit further down on the page?
Dino
-
Hello Steven,
Thank you for providing another perspective. However, all factors considered, I agree with Shane's approach on this one. The pages add very little merit to the site and exist primarily to provide the site users with eye-candy (e.g. photos of classic cars).
-
Just personally, I would still deindex or canonical them - they are just pages with a few images - so not of much value and unless all titles and descriptions are targeting varying keywords and content is added, they will canabalize eachother, and possibly even drag down the site due to 100's of pages of thin content....
So actually from an SEO perspective it probably IS better to deindex or canonical 3 - 5 or so years ago, maybe the advice would have been keep them and keyword target - but not in the age of content
(unless the images were optimized for image searches for sale able products (but I do not think it is)
-
Hi Dino,
I know this won't solve the immediate problem you asked for, but wouldn't it be better for your client's site (and for SEO) to alter the PHP so that the title and meta data description are replaced with variables that can also be dynamic, depending on whichever of the 100 or so pages gets created?
That way, rather than worrying about a robot seeing 100 pages as duplicate content, it could see 100 pages as 100 pages.
-
It depends on how the pages are being created (I would assume it is off of a template page)
So within the template of this dynamically created page you would place
But if this is the global template - you cannot do this as it will noindex every page which of course is bad.
If you want to PM me the URL of the page I can take a look at your code, and see what is going on and how to recitify, as right now i think we are talking about the same principles, but different words are being used.
It really is pretty straightforward. (what I am saying) The pages that you want to be not indexed DO NOT need a nofollow they need a meta noindex
But there are many variables, as if you have already robot.txt disallowed the directory, then no bot will go there to get the updated noindex directive....
If there is no way to add a meta noindex then you need to nofollow and put in for a manual removal
-
I completely understand and agree with all points you have conveyed. However, I am not certain as to the best approach to "noindex" the urls which are being created dynamically from within the static html page? Maybe I am making this more complex than it needs to be...
-
So it is the pages themselves that are dynamically created you want out of index, not the page the contains the links?
If this is so ---
noindex the pages that are created dynamically
Therein lies the problem. I did have the nofollow directive in place specifying the /gallery/ folder, but apparently, the bots still crawled it.
Nofollow does not remove from index, it only tells the bot not to pass authority, as it is still feasible that the bot will crawl the link, so without the noindex, nofollow is not the correct directive due to the page (even though nofollowed) is still being reached and indexed.
PS. also if you have the nofollow on the links, you may want to remove it, so the bots will go straight through to the page and grab the noindex directive, but if you wanted to try to not let any authority "evaporate" you can continue to nofollow, but you may need to request the dynamically generated pages (URLS) be removed using webmaster tools.
-
The goal is to have the page remain in the index, but not follow any dynamically generated links on the page. The nofollow directive (in place for months) has not done the job.
-
?
If a link is coming into the page, and you have Noindex, Nofollow - this would remove from index and prevent the following of any links -
This is NOT instant, and can take months to occur depending on depth of page, crawl schedule ect... (you can try to speed it up by using webmaster tools to remove the URL)
What is the goal You are attempting to achieve?
To get the page out of index, but still followed?
Or remain in index, but just not follow links on page?
?
-
Therein lies the problem. I did have the nofollow directive in place specifying the /gallery/ folder, but apparently, the bots still crawled it. I agree that the noindex removes the page, but I wasn't certain if it prevented crawling of the page, as I have read mixed opinions on this.
I just thought of something else... perhaps an external url is linking to this page - allowing it to be crawled. I am off to examine this perspective.
Thanks for your response!
-
noindex will only remove from Index and dissallow the act of indexing the specific page (or pages created off template) you place the tag in upon the next page crawl.
Bots will still follow the page, and follow any links that are readable as long as there is not a nofollow directive.
I am not sure I fully understand the situation, so I would not say this is my "reccomendation" but an answer to the specific question.....
but I am wondering if this will stop the bots dead in their tracks or will they still be able to pick-up on the pages generated
Hope this helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content issues - page content and store URLs
Hi, I'm experiencing some heavy duplicate content Crawl errors on Moz with www.redrockdecals.com and therefore I really need some help. It brings up different connections between products and I'm having a hard time figuring out what it means. It is listing the same products as duplicate content but they have different URL endings. For example:http://www.redrockdecals.com/car-graphics/chevrolet-silverado?___store=nl&___from_store=us
On-Page Optimization | | speedbird1229
&
http://www.redrockdecals.com/car-graphics/chevrolet-silverado?___store=d&___from_store=us It seems like Moz considers the copy-pasted parts in the Full Description (scrolled a bit down on product pages) as Duplicate Content. For example the general text found on this page: http://www.redrockdecals.com/caution-tow-limited-turning-radius-decal Or this page: http://www.redrockdecals.com/if-you-don-t-succeed-first-time-then-skydiving-isn-t-for-you-bumper-sticker I am planning to write new and unique descriptions for all products but what do you suggest - should I either remove the long same descriptions or just shorten them perhaps so they don't outweigh the short but unique descriptions above? I've heard search engines understand that some parts of the page can be same on other pages but I wonder if in my case this has gone too deep... Thanks so much!0 -
Duplicate Page Content
Hey Moz Community, Newbie here. On my second week of Moz and I love it but have a couple questions regarding crawl errors. I have two questions: 1. I have a few pages with duplicate content but it say 0 duplicate URL's. How do I know what is duplicated in this instance? 2. I'm not sure if anyone here is familiar with an IDX for a real estate website. But I have this setup on my site and it seems as though all the links it generates for different homes for sale show up as duplicate pages. For instance, http://www.handyrealtysa.com/idx/mls...tonio_tx_78258 is listed as having duplicate page content compared with 7 duplicate URLS: http://www.handyrealtysa.com/idx/mls...tonio_tx_78247
On-Page Optimization | | HandyRealtySA
http://www.handyrealtysa.com/idx/mls...tonio_tx_78253
http://www.handyrealtysa.com/idx/mls...tonio_tx_78245
http://www.handyrealtysa.com/idx/mls...tonio_tx_78261
http://www.handyrealtysa.com/idx/mls...tonio_tx_78258
http://www.handyrealtysa.com/idx/mls...tonio_tx_78260
http://www.handyrealtysa.com/idx/mls...tonio_tx_78260 I've attached a screenshot that shows 2 of the pages that state duplicate page content but have 0 duplicate URLs. Also you can see somewhat about the idx duplicate pages. rel="canonical" is functioning on these pages, or so it seems when I view the source code from the page. Any help is greatly appreciated. skitch.png0 -
Duplicate page content
what is duplicate page content, I have a dating site and it's got a groups area where the members can base there discussions in a category like for an example, night life, health and beauty, and such. why would this cause a problem of duplicate page content and how would I fix it. explained in the terms of a dummy.
On-Page Optimization | | clickit2getwithit0 -
Internal Linking - in content vs navigation menu
Would like to get some thoughts on whether navigation menus or in-content links are best for internal linking, from an SEO standpoint. A few thoughts to get started with: For sites with a lot of content, you can have a navigation menu linking to your higher-level pages, then in-content links to deeper pages on your site. For smaller sites, this is not an option, as the navigation menu will probably link to all your important pages. You could add in-content links, but Google only counts the first link on the page, so the in-content links would be ignored if you'd already linked yp the page in your top nav menu. I can think of several possible reasons navigation menu links could be less desirable than in content links from a Google perspective. (They are sitewide boilerplate content without context.) If you setup your navigation structure based on what is best for the user, small sites don't have much wiggle room to optimize internal link structure, as all their money pages will be linked to from the top nav menu. Do you think Google prefers in content links to navigation menu links? If so, how do you get around the fact that for many sites, all their money pages are being linked to from their main navigation menu?
On-Page Optimization | | AdamThompson0 -
Is there a guide to best practices for site content and blogs?
We have been working hard producing good content for our sites and now we need to know what are the most current best practices regarding placing and organizing content. We do the usual social media blast with Twitter, FB, G+ with each blog post. But it seems there is more that can and should be done. What about authorship and schema tags?
On-Page Optimization | | devonkrusich0 -
Offer landing page, duplicate content and noindex
Hi there I'm setting up a landing page for an exclusive offer that is only available (via a link) to a particular audience. Although I've got some specific content (offer informaiton paragraph), i want to use some of the copy and content from one of my standard product pages to inform the visitors about what it is that i'm trying to sell them. Considering I'm going to include a noindex on this page, do i need to worry about it having some content copied directly from another page on my site? Thanks
On-Page Optimization | | zeegirl0 -
Duplicated Page Content
I have encountered this weird problem about duplicate page content. My site got 3 duplicate content similar on the link structure below. If I'm going to use rel canonical does it help to resolve the duplication problem? Thanks http://www.sample.com http://www.sample.com/ http://www.sample.com/index.php
On-Page Optimization | | mattvectorbpo0 -
Checking Duplicate Content
Hi there, We are migrating to a new website, which we are writing lots of new content for the new website. The new website is hosted on a development site which is password protected and so on so that it cannot be indexed. What i would like to know is, how do i check for duplicate content issues out there on the world wide web with the dev site being password protected? Hope this makes sense. Kind Regards,
On-Page Optimization | | Paul780