XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What would the Impact of having a sitemap be?
Hi, some more general question here: How important would you rate it to have a sitemap? Would you rate it fundamentally important or just something you can add as bonus? Thanks in advance
Intermediate & Advanced SEO | | brainfruit0 -
Sitemap: unique sitemap or different sitemaps by Country
Hi guys, i have a question about sitemaps. We are doing an international site, e.x. www.offers.com for landing page and www.offers.com/br for brazil, www.offers.com/it for italy, etc... i don't if we should do an unique sitemap for all countries or separate sitemaps by country, e.x.: unique sitemap: www.offers.com/sitemap.xml - including all sitemaps www.offers.com/br/sitemap.xml - sitemap for brazil market only. Thank you
Intermediate & Advanced SEO | | thekiller990 -
Add versioning to an xml sitemap?
Is there a way to add versioning to an xml sitemap? Something like <version>x.x</version> outside of the <urlset>?</urlset> I've looked at a bunch of sitemaps for various sites and don't see anyone adding versioning information, but it seems like it would be a common issue - I can't believe someone hasn't come up with some way to do it.
Intermediate & Advanced SEO | | ATT_SEO0 -
Why is this url redirecting to our site?
I was doing an audit on our site and searching for duplicate content using some different terms from each of our pages. I came across the following result: www.sswug.org/url/32639 redirects to our website. Is that normal? There are hundreds of these url's in google all with the exact same description. I thought it was odd. Any ideas and what is the consequence of this?
Intermediate & Advanced SEO | | Sika220 -
XML Sitemap on another domain
Hi, We've rebuilt our website and created a better sitemap index structure. There's a good chance that we not be able to append the XML files to existing site for technical reasons (don't get me started). I'm reaching out because I'm wondering if can we place the XML files on another website or subdomain? I know this is not best practice and probably very grey but I'm looking for alternatives. If there answer is DON'T DO IT let me know too. Thx
Intermediate & Advanced SEO | | WMCA0 -
URL Question and Advice on Site Architecture
Good morning one and all, i have a specific question pertaining to my Domain Migration Website URL structure. I have a computer repair business that I am re branding and my question at this point is centrally focused on how to best handle my URL naming structure that will best suite my needs for my the Search Engines and also my customers UX while not looking SPAMMY I am a web developer and SEO and I am building a SILO Site Architecture in WordPress using Pages (not Posts) so no discussion is need on the Permalink structure. I am attaching several Images below of Screen Shots of the new site that I have designed so that you may look at them and see the Silo Architecture Layout in action for the most part. OK, here we go. Looking at the Silo Mast Head, we can see that the following Main Menu items each represent a specific Silo Theme Silo Theme # 1 - COMPUTER REPAIR Silo Theme # 2 - VIRUS REMOVAL Silo Theme # 3 - PHONE REPAIR Silo Theme # 4 - NETWORKING Silo Theme # 5 - DATA RECOVERY My specific question is, if /computer-repair/ is a main silo theme (WP -Parent Page) and /laptop-repair/ is a (Child Page) of Computer Repair is the following example below (the actual URL string) going to 'trigger' a SPAM signal to either the user or GOOGLE or both?? URL String: http://www.pcmedicsoncall.com/computer-repair/laptop-repair/ Here's another example with the VIRUS REMOVAL SILO http://www.pcmedicsoncall.com/virus-removal/malware-removal/ Seeing how computer repair is the main silo theme that cannot be changed in the URL Structure (it can) but I wont change it seeing how COMPUTER REPAIR is the single largest keyword phrase used by individuals when they are looking for computer repair. Secondly, - LAPTOP REPAIR is also a Keyword Phrase that that has HIGH search queries that I am trying to rank for and that too (ideally) should also not changed! How do I deal with this situation? Or, am I seeing this in a overly paranoid way? I currently have the site allowing only my IP Address so I am afraid that the screen shots below is all that I can do on this in lieu of actually visiting the Site Currently, I have my URL Structure where Wilmington NC immediately follows the targeted keyword phrase for the Silo Theme like below http://www.pcmedicsoncall.com/virus-removal-wilmington-nc/malware-removal/ The example above, - including the location after the keyword phrase does look much more attractive and breaks it up so it does not read SPAMMY and it will help with SEO but yet another problem exists using the location after the keyword phrase which I explain in detail Below. On top of doing a complete re-branding Domain Change I am actually going to be relocating myself and my business to Charlotte, NC at the end of the summer so I have serious doubts if using Wilmington NC within the URL structure would be a wise idea considering that I will be relocating and an internal 301 Redirect on a Newly Migrated site 2-3 months after the initial site migration and site setup may have some negative impact and confuse Google and compound the situation thus much further despite the fact that it would immediately help me bounce back up with my rankings after the migration process. Thoughts a suggestions on both explained scenarios please? I have asked this specif question once already but obviously people do not read my very detailed and well thought out questions. This can also be viewed here>http://www.seomoz.org/q/need-very-urgent-advice-on-wedsite-migration-questions-please#reply_150847> Thank you Sincerely, Marshall Thompson SEOMOZ-PC-MEDICS-ON-CALL-1.jpg SEOMOZ-PC-MEDICS-ON-CALL1.jpg
Intermediate & Advanced SEO | | MarshallThompson310 -
On-site links
Hi everybody, There's a lot of information about getting sitewide backlinks, but so few about on-site optimization. Is there a maximum of links to put on a page ? Is there a maximum of link that a page should receive ? etc ... ? So, what is the optimal strategy ? And I'm only concerned about on-page and on-site link, not backlinks commming from other sites. Thanks
Intermediate & Advanced SEO | | DavidPilon0 -
SEO question
Hi i changed my page titles for a competitive keyword last week and noticed it has dropped 9 search engine ranking positions. Was ranking 37 and now it 46. Would you guys leave it and see if it starts creeping back up or change again? the page title i used was across my pages for example was Primary keyword | secondary keyword | Heading on page thanks for you help
Intermediate & Advanced SEO | | wazza19850