How was cdn.seomoz.org configured?
-
The SEOmoz CDN appears to have a "pull zone" that is set to the root of the domain, such that any static file can be addressed from either subdomain:
http://www.seomoz.org/q/moz_nav_assets/images/logo.png
http://cdn.seomoz.org/q/moz_nav_assets/images/logo.png
The risk of this configuration is that web pages (not just images/CSS/JS) also get cached and served by the CDN. I won't put the URL here for fear of Google indexing it, but if you replace the 'www' in the URL below with 'cdn', you'll see a cached copy of the original:
http://www.seomoz.org/ugc/the-greatest-attribution-ever-graphed
The worst-case scenario is that the homepage gets indexed. But this doesn't happen here:
That URL issues a 301 redirect back to the canonical www subdomain. As it should.
Here's my question: how was that done?
Because maxcdn.com can't do it. If you set a "pull zone" to your entire domain, they'll cache your homepage and everything else. googlebot has a field day with that; it will reindex your entire site off the CDN.
Maybe the SEOmoz CDN provider (CloudFront) allows specific URLs to be blocked? Or do you detect the CloudFront IPs and serve them a 301 (which they'd proxy out to anyone requesting cdn.seomoz.org)?
One solution is to create a pull zone that points to a folder, like example.com/images... but this doesn't help a complex site that has cacheable content in multiple places (do you Wordpress users really store ALL your static content under /wp-content/ ?).
Or, as suggested above, dynamically detect requests from the CDN's proxy servers, and give them a 301 for any HTML-page request. This gets complex quickly, and is both prone to breakage and very difficult to regression-test.
Properly retrofitting a complex site to use a CDN, without creating a half-dozen new CDN subdomains, does not appear to be easy.
-
its a SEOmoz secret...
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Suggested Screaming Frog configuration to mirror default Googlebot crawl?
Hi All, Does anyone have a suggested Screaming Frog (SF) configuration to mirror default Googlebot crawl? I want to test my site and see if it will return 429 "Too Many Requests" to Google. I have set the User Agent as Googlebot (Smartphone). Is the default SF Menu > Configuration > Speed > Max Threads 5 and Max URLs 2.0 comparable to Googlebot? Context:
Intermediate & Advanced SEO | | gravymatt-se
I had tried NetPeak SEO Spider which did a nice job and had a cool feature that would pause a crawl if it got to many 429. Long Story short, B2B site threw 429 Errors when there should have been no load on a holiday weekend at 1:00 AM.0 -
Anyone have a good process for Schema.org auditing?
I am looking to do a Schema.org audit across a large number of websites that all run on the same platform. I'm not really sure where to start and what format to use for a deliverable. I suppose starting by checking for errors on the current schema and documenting them and then moving on to additional schema that could be added to the JSON+LD? My last structured data audit I just used a spreadsheet and it didn't come out as neat as I would have liked. Anyone who has some experience in this, your input would be much appreciated!
Intermediate & Advanced SEO | | MJTrevens0 -
Exact match .org Ecommerce: Reason why internal page is ranking over home page
Hello, We have a new store where an internal category page (our biggest category) is moving up ahead of the home page. What could be the reason for this? It's an exact match .org. Over-optimization? Something else? It happened both when I didn't optimize the home page title tag and when I did for the main keyword, i.e. mainkeyword | mainkeyword.org, or just mainkeyword.org Home Page. Both didn't help with this. We have very few backlinks. Thanks
Intermediate & Advanced SEO | | BobGW0 -
We used to speak of too many links from same C block as bad, have CDN's like CloudFlare made that concept irrelevant?
Over lunch with our head of development, we were discussing the way CloudFlare and other CDN's help prevent DDOS attacks, etc. and I began to wonder about the IP address vs. the reverse proxy IP address. Before we would look to see commonalities in the IP as a way that search engines would modify the value to given links and most link software showed this. For ahrefs, I know they still show common IPs using the C block as the reference point. I began to get curious about what was the real IP when our head of dev said, that is the IP from CloudFlare... So, I ran a site in ahrefs and we got an older site we had developed years ago that showed up as follows: Actos-lawsuit.org 104.28.13.57 and again as 104.28.12.57 (duplicate C block is first three sets of numbers are the same and obviously, this has a .12 and a .13 so not duplicate.) Then we looked at our host to see what was the IP shown there: 104.239.226.120. So, this really begs a question of is C Block data or even IP address data still relevant with regard to links? What do the search engines see when they look for IP address now? Yes, I have an opinion, but would love to hear yours first!
Intermediate & Advanced SEO | | RobertFisher0 -
How complex or what to consider when moving from a .aspx webdeveloper to my own wordpress.org website?
Basically my current web developer is not providing me with what a modern website should need to fully utilize online marketing and SEO in terms of blogging, social media widgets, e-commerce and so on. Because of this I have thought of moving to a wordpress.org website run and built by myself. Is this a good idea? What is the best way to migrate and save existing authority (Re-directs etc)? Is there any potential risks or problems that I could encounter that aren't immediate obvious? Many thanks! Tom
Intermediate & Advanced SEO | | CoGri0 -
How can I export SEOmoz ranking reports to google spreadsheet
How can I export SEOmoz website rankings to Google Spreadsheet? I have applied other SEOmoz API's and Google Spreadsheet combos effectively but cannot find anything online for this. I would like to display current ranking and ranking history for specific keywords in Google Spreadsheet and have them update automatically using the SEOmoz API.
Intermediate & Advanced SEO | | Michael_Rock0 -
Schema.org on Youtube iframe embed?
So I've tried scouring the internet on the proper way to markup youtube videos. I know there's the VideoObject propery but that seems to be more made for the old school embed code that looks like this: <embed width="100%" id="video-player-flash" height="100%" type="application/x-shockwave-flash" src="http://s.ytimg.com/yt/swfbin/watch_as3-vflpp9opi.swf" allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000" flashvars="el=embedded&fexp=904001%2C914057%2C918000%2C910206%2C907217%2C907335%2C921602%2C919306%2C922600%2C919316%2C920704%2C912804%2C913542%2C919324%2C912706&is_html5_mobile_device=false&tabsb=1&hl=en_US&eurl=http%3A%2F%2Fwww.dial800.com%2Fblog%2Fvideos%2Fdial800-product-overview-video&iurl=http%3A%2F%2Fi4.ytimg.com%2Fvi%2Fgk1aD9UCKYA%2Fhqdefault.jpg&tspto=12000&probably_logged_in=1&tsp_buffer=10&video_id=gk1aD9UCKYA&tsp_dvrloop=50&sendtmp=1&enablejsapi=1&sk=WZy3rFIXzzhTB_BpmE1p1tTsbxMib1vIC&rel=1&playlist_module=http%3A%2F%2Fs.ytimg.com%2Fyt%2Fswfbin%2Fplaylist_module-vfl3lol2H.swf&jsapicallback=ytPlayerOnYouTubePlayerReady&playerapiid=player1&framer=http%3A%2F%2Fwww.dial800.com%2Fblog%2Fvideos%2Fdial800-product-overview-video"> Do I need to use that code or is it possible to mark it up using just the clean iframe src that youtube provides now?
Intermediate & Advanced SEO | | SirSud0 -
SEOMOZ duplicate page result: True or false?
SEOMOZ say's: I have six (6) duplicate pages. Duplicate content tool checker say's (0) On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory. To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Pneumatic-grippers.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name. What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing. So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page." ****Page Authority Linking Root Domains http://www.agi-automation.com/ 43 82 http://www.agi-automation.com/index.html 25 2 http://www.agi-automation.com/Linear-escapements.htm 21 1 www.agi-automation.com/linear-escapements.htm 16 1 http://www.agi-automation.com/Pneumatic-grippers.htm 30 3 http://www.agi-automation.com/pneumatic-grippers.htm 16 1**** Duplicate content tool estimates the following: www and non-www header response; Google cache check; Similarity check; Default page check; 404 header response; PageRank dispersion check (i.e. if www and non-www versions have different PR).
Intermediate & Advanced SEO | | AGIAutomation0