Canonical URLs and screen scraping
-
So a little question here. I was looking into a module to help implement canonical URLs on a certain CMS and I came a cross a snarky comment about relative vs. absolute URLs being used. This person was insistent that relative URLs are fine and absolute URLs are only for people who don't know what they are doing.
My question is, if using relative URLs, doesn't it make it easier to have your content scraped? After all, if you do get your content scraped at least it would point back to your site if using absolute URLs, right? Am I missing something or is my thinking OK on this?
Any feedback is much appreciated!
-
Thanks for your reply, Alan. I also considered a screen scraper removing the canonical tag, but to me screen scraping seemed lazy in the first place and so maybe they wouldn't bother in most cases. I guess that a best practice with canonicals is really situation dependent.
-
Thanks, Robert. Your rational for using relative links make sense. I appreciate you helping me sort through the noise on this issue.
John
-
People don’t abuse people when you have facts on their side, reminds me of "you don’t believe in global warming, because your un-educated" argument.
I have seen just in the last few weeks where using absolute url has got me a link. I wrote a youmoz article with a link to my website, it has been copied and has the link in it. Of cause being on SEOMoz, I have to use a absolute url back to myself
I don’t usually use absolute links on my own site, I think search engines almost always know who copied who.
I agree with rob, but I will add, a good screen scraper will remove a canonical tag, but removing absolute links is not so easy, as you then have broken links, also I believe if you have image in the article linking back to you, search engines will know who the real owner is, same with css, js and a number of other refs. Screen scrapers rarely get credit for these reasons as well as the fact that if your site has a lot of duplicate, then it is obvious that you are the one coping It’s either the one site is copied from many locations or many locations have copied from the one site. -
John
You can use either and the web is full of those who go back and forth on this issue. My guess is that any really good scraper software can likely deal with absolute urls today. The advantage that we like with relative is all about page load speed - the file size is smaller with relative urls.
So, you will get arguments both ways. If scraping is a huge issue for you, maybe you go with absolute. We know people will scrape content and we continue with relative for the above reason and because it is easier to make certain changes/linking/redirects within a CMS.
Oh as to people who use absolutes not knowing what they are doing....that is bunk. They have other priorities, maybe.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Wrong canonical URL was specified. How to refresh the index now?
Wrong canonical URL was applied to thousands of pages of a client website, pointing them all to a single non-existing URL. Now Google has de-indexed most of those pages. We have fixed the problem now, but do we get Search engines crawl those pages again and start showing in Search results? I understand that a slow recovery is possible if we don't do anything. Was wondering if we can fast track the recovery... Any pointers? Thanks
Technical SEO | | Krupesh0 -
Rel=Canonical for filter pages
Hi folks, I have a bit of a dilemma that I'd appreciate some advice on. We'll just use the solid wood flooring of our website as an example in this case. We use the rel=canonical tag on the solid wood flooring listings pages where the listings get sorted alphabetically, by price etc.
Technical SEO | | LukeyB30
e.g. http://www.kensyard.co.uk/products/category/solid-wood-flooring/?orderBy=highestprice uses the canonical tag to point to http://www.kensyard.co.uk/products/category/solid-wood-flooring/ as the main page. However, we also uses filters on our site which allows users to filter their search by more specific product features e.g.
http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/18mm/
http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/natural-lacquered/ We don't use the canonical tag on these pages because they are great long-tail keyword targeted pages so I want them to rank for phrases like "18mm solid wood flooring". But, in not using the canonical tag, I'm finding google is getting confused and ranking the wrong page as the filters mean there is a huge number of possible URLs for a given list of products. For example, Google ranks this page for the phrase "18mm solid wood flooring" http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/18mm,116mm/ This is no good. This is a combination of two filters and so the listings are very refined, so if someone types the above phrase into Google and lands on this page their first reaction will be "there are not many products here". Google should be ranking the page with only the 18mm filter applied: http://www.kensyard.co.uk/products/category/solid-wood-flooring/f/18mm How would you recommend I go about rectifying this situation?
Thanks, Luke0 -
Canonical needed after no index
Hi do you need to point canonical from a subpage to main page if you have already marked a no index on the subpage, like when google is not indexing it so do we need canonicals now as is it passing any juice?
Technical SEO | | razasaeed0 -
%20 URL accessible, does this matter?
I have a rewrite on the CMS I work on. What happens here is that if someone creates a page on the website and uses spaces as the name then the CMS automatically replaces the spaces with -'s. I noticed this morning that the %20 URLs are accessible but not indexed at all. Only the - URLs are indexed. could this cause duplicate content or penalties? I know best practice is to have only ONE URL for a page but somehow the developer can't redirect the %20 URLs to the - URLs. Opinions?
Technical SEO | | DROIDSTERS0 -
How to use rel canonical?
Hi, I am having some questions about this and I think you can help me on this. Here I have the example of my problem: pagination: Suppose that I have a new with 2 pages http://www.espectador.com/noticias/208907/fernando-pereira-encuesta-de-cifra-prendio-una-lucecita-amarilla-en-el-pit-cnt you can access the first page by different ways: www.espectador.com/1v4_contenido.php?m=&id=250419&ipag=1 http://www.espectador.com/1v4_contenido.php?m=&id=250419 http://www.espectador.com/noticias/250419/alvaro-vega-fa-creo-que-cosmo-fue-usada-por-bqb-para-evitar-una-subasta-a-la-baja-y-asi-quedar-con-las-manos-libres Same meta descr, same body with different URLs. Can I use rel canonical in the file 1v4_contenido.php that point to the friendly url? <link rel="<a class="attribute-value">canonical</a>" href="[http://www.espectador.com/noticias/250419/alvaro-vega-fa-creo-que-cosmo-fue-usada-por-bqb-para-evitar-una-subasta-a-la-baja-y-asi-quedar-con-las-manos-libres](view-source:http://www.espectador.com/noticias/250419/alvaro-vega-fa-quotcreo-que-cosmo-fue-usada-por-bqb-para-evitar-una-subasta-a-la-bajaquot-y-asi-quotquedar-con-las-manos-libresquot)"/> do I have a loop here? The rel canonical can goes in the page 1? Thanks
Technical SEO | | informatica8100 -
Ignore Urls with pattern.
I have 7000 warnings of urls because of a 302 redirect. http://imageshack.us/photo/my-images/215/44060409.png/ I want to get rid of those, is it possible to get rid of the Urls with robots.txt. For example that it does not crawl anything that has /product_compare/ in its url? Thank you
Technical SEO | | levalencia10 -
Keywords in Vanity URL
If I set up a vanity URL that just 301's to the main site, do the search engines look at the keywords in the vanity URL when determing how to rank the site. For example, if I set up a vanity URL of www.coolnewtechgear.com, and redirect it to www.company.com/products/, would the search engines view the keywords of cool, new, tech, and gear and associate that with the page it's getting redirected to? Or does it ignore the vanity URL and only look at the content of the page itself?
Technical SEO | | ryanwats0 -
Canonical for non-exist URL ?
Hi I have a website what has parameter URL. For example www.example.com/index.php?page_id=1&no=2 I want that search engine see my page URL as; www.example.com/toys/cars But this URL is not exist in my website. And when i externally enter this page it goes to 404 page. If i add canonical url as www.example.com/toys/cars to the page www.example.com/index.php?page_id=1&no=2, what happened ? Is the url at the serp change as www.example.com/toys/cars ?
Technical SEO | | SEMTurkey0