Metadata and duplicate content issues
-
Hi there: I'm seeing a steady decline in organic traffic, but at the same time and increase in pageviews and direct traffic. My site has about 3,000 crawl errors!! Errors are duplicate content, missing description tags, and description too long. Most of these issues are related to events that are being imported from Google calendars via ical and the pages created from these events. Should we block calendar events from being crawled by using the disallow directive in the robots.txt file? Here's the site: https://www.landmarkschool.org/
-
Yes, of course you can keep running the calendar .
But you have to keep in mind somes pages will still appear in search results even when you has deleted those URL.
You can watch this video
Matt Cutts explains why a page that is disallowed in robots.txt may still appear in Google's search results.On that case just to make sure, you can implement a 301 redirection.
This is going to be your second line defense. Just redirect all of those URLs to your home page.
There are many option to make a redirection. In my I'm case wordpress user so, whit a simple plugin I can resolve the problem in 5 minutes, in your case I have been checking your website and I have no idea which cms you are using.
Anyway you can use this app 301 Redirect Code Generator with many option available
PHP, JS, ASP, ASP.NET and of course APACHE (htaccess)Now is the right moment to use the list that I mentioned in my first answer.
(2 - Create a list of all url that you want disable)**So lets talk about your second question. **
Of course it will hurt your ranking, if you have 3020 index pages on google but just 20 of those pages are useful for the users you have a big problem.A website should address any question or concern that a current or potential customer or client may have. If it doesn’t, the website is essentially useless.
with a simple divison 20 / 3020= 0.00625 less that 1% of your site is useful. So Im pretty sure that your rank has ben affected.
Dont forget mark my answer as a "GOOD ANSWER" that will make me happy, and good luck.
-
Hi Roman: Thanks so much for your prompt reply. I agree that using robots.txt is the way to go. I do not want to disable the google calendar sync (we're a school and need our events to feed from several google calendars). I want to confirm that the robots.txt option will still work if the calendars are still syncing with the site.
One more question--do you think that all these errors are causing the dip in organic traffic?
-
SOLUTION
1 - You have to disable the google calendar sync with your website
2 - Create a list of all url that you want disable
3 - At this point you have multiples option to block those URLs that you want to exclude from search engines.So first lets define your problem
By blocking a URL on your site, you can stop Google from indexing that web page for display in Google Search results. In other words, people looking through Google Search results can't see or navigate to a blocked URL or its content.
If you have pages or other content that you don't want to appear in Google Search results, you can do this using a number of options:
- robots.txt files (Best Option)
- meta tags
- password-protection of web server files
In your case the option 2 will take a lot of time, why? beacuse you will have to manually add the "noindex" meta tag to each page, one by one....no make sense and the option 3 requires some server configurations and for me are little bit complex and time consuming at leats in my case, I would have to research on google, see some videos on Youtube and see what happen.
So firts option is the winner for me ....let see some example of how your robot.txt should look like.
- The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/events/january/" or "/tmp/", or /calendar.html:
<------------------------------START HERE------------------------------>
robots.txt for https://www.landmarkschool.org/
User-agent: *
Disallow: /events/january/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /calendar.html
<------------------------------END HERE------------------------------>FOR MORE INFO SEE THE VIDEO > https://www.youtube.com/watch?v=40hlRN0paks
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Oh Japanese Character and 404 issue
This one has me stumped, so I really hope someone can help. I am assisting a company who are having 404 errors on certain japanese pages of their site. I have checked in web master tools and analytics, can see the 404, but have no idea where it is coming from. I am starting to think it may be a encoding issue of some sort. Just wondered if anyone has come across this before? Site uses site finity. What is happening is reader is going to page domain.com/ja/blog/article then coming to this page /404?aspxerrorpath=/Sitefinity/WebsiteTemplates/App_themes/"website"/fonts/open sans regular Really odd - as we cannot re-create this 404 I don't know how they are getting to it? Also in Analytics some of the pages that are written in japanese that are giving 404's look like this C3%A3%E2%80%9A%C2%BD%C3%A3%C6%92%C2%AA%C3%A3%C6%92%C2%A5%C3%A3%C6%92%C2%BC%C3%A3%E2%80%9A%C2%B7%C3%A3%C6%92%C2%A7%C3%A3%C6%92%C2%B394.8%C3%A3%C2%81%C2%AE%C3%A5%C2%AF%C2%BE%C3%A8%C2%B1%C2%A1/%C3%A9%C2%A1%C2%A7%C3%A5%C2%AE%C2%A2%C3%A3%E2%80%9A%C2%B0%C3%A3%C6%92%C2%AB%C3%A3%C6%92%C2%BC%C3%A3%C6%92%E2%80%94%C3%A6%C2%AF%C5%BD%C3%A3%C2%81%C2%AE94.8 Any help much appreciated
Reporting & Analytics | | Kelly33300 -
How often does google content experiments stats update?
From my experience it seems to update once per day (every 24 hours), can anyone confirm this is the case or have a link to an official announcement which confirms how often the data updates? It would be handy to know when it updates so we can see the latest information as it comes in.
Reporting & Analytics | | Twist3600 -
Pages with Duplicate Page Content
Hi Just started use the Moz and got an analytics report today! There about 104 duplicate pages apparently, the problem is that they are not duplicates, but just the way the page has been listed with a description! The site is an Opencart and every page as got the name of the site followed by the product name for the page! How do you correct this issue?? Thank for your help
Reporting & Analytics | | DRSMPR1 -
Duplicate page content
I'm seeing duplicate page content for tagged URLs. For example:
Reporting & Analytics | | DolbySEO
http://www.dolby.com/us/en/about-us/careers/landing.html
http://www.dolby.com/us/en/about-us/careers/landing.html?onlnk=al-sc as well as PPC campaigns. We tag certain landing pages purposefully in order to understand that traffic comes from these pages, since we use Google Analytics and don't have the abiility to see clickpaths in the package we have. Is there a way to set parameters for crawling to exclude certain pages or tagged content, such as those set up for PPC campaigns?0 -
I have few similar job forms that were created for different positions. SEOMoz says, its "duplicate pages". So how do I resolve it? I want my jobs to be searchable in Search Engines.
Hi There, I have few similar job forms that were created for different positions. SEOMoz says, its "duplicate pages". So how do I resolve it? I want my jobs to be searchable in Search Engines. Thanks !
Reporting & Analytics | | pointstar0 -
Robots.txt file issue.
Hi, Its my third thread here and i have created many like it on many webmaster communities.I know many pro are here so badly needs help. Robots.txt blocked 2k important URL's of my blogging site http://Muslim-academy.com/ Especially of my blog area which are bringing good number of visitors daily.My organic traffic declined from 1k daily to 350. I have removed the robots.txt file.Resubmitted existing Sitemap.Used all Fetch to index options and 50 URL submission option in Bing Webmaster Tool. What Can I do know to have these blocked URL's back in Google index? 1.Create a NEW sitemap and submit it again in Google webmaster and bing webmaster tool? 2.Bookmark,linkbuilding or share the URL's.I did a lot of bookmarking for blocked URL's. I fetch the list of blocked URLS Using BING WEBMASTER TOOLS.
Reporting & Analytics | | csfarnsworth0 -
Why seomoz shows me "missing meta discription" on this plugin: http://villasdiani.com/wp-content/plugins/dopbsp/frontend-ajax.php ? Should I edit? how?? IIs it posible??
Good day to all! I am very confuse about results on seomoz Crawl Diagnostics Summary, especially with 6 Crawl Warnings Found. It says title too short: http://villasdiani.com/category/mombasa/, http://villasdiani.com/category/watamu/ , http://villasdiani.com/sitemap/ why would Google punish me for this??? why should I make longer title for sitemap? or Watamu? It is the name of the place - Watamu or Mombasa. It is very confusing for me. I have very big mess with the website and it is not ranking:-( what I have done:-( and is it possible to meta description for this plugin: http://villasdiani.com/wp-content/plugins/dopbsp/frontend-ajax.php how?? I even do not know where is it.
Reporting & Analytics | | VillasDiani0 -
Time until duplicate penalty is lifted?
Hello, I recently discovered that half of the pages on my site, about 3,500 were not being indexed or were indexing very very slow and with a heavy weight on them. I discovered the problem in the "HTML Suggestions" within Google's Webmaster Tools. An example of my main issue. All 3 of these URL were showing 200 Status OK in Google. www.getrightmusic.com/mixtape/post/ludacris_1_21_gigawatts_back_to_the_first_time www.getrightmusic.com/mixtape/post/ludacris_1_21_gigawatts_back_to_the_first_time/ www.getrightmusic.com/mixtape/ludacris_1_21_gigawatts_back_to_the_first_time I added some code to the .htaccess in order to remove the trailing slashes across the board. I also properly set up my 404 redirects, which were not properly set up by my developer (when the site "relaunched" 6 months ago 😞 ) I then added the Canonical link rel tags on the site posts/entries. I'm hoping I followed all the correct steps in fixing the issue and now, I guess, I just have to wait until the penalty gets lifted? I'm also not %100 certain that I have been penalized. I'm just assuming based on the SERP ceiling I feel and the super slow or lack of indexing my content. Any insight, help or comments would be super helpful. Thank you. Jesse
Reporting & Analytics | | getrightmusic0