XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Dynamic XML Sitemap Generator
Has anyone used a Dynamic XML Sitemap Generator tool? Looking for recommendations!
Intermediate & Advanced SEO | | Matchnode0 -
Merging two sites to one Rel Can Question
Hi All, We have 2 brands (1 main 1 product as a satellite domain) that we're merging into our main brand. When doing a 301 redirect - should we redirect everypage of the product satellite to the new site or is 1 main redirect fine? I'm Confusing ....yep. Ill do an E.G www.nike.com & www.air-jordan.com we are now shutting down www.airjordan.com and will be migrating all the content to www.nike.com/air-jordan Just of course there will be other pages like air-jordan.com/order-now . Should i do a rel can from air-jordan.com/order-now to www.nike.com/air-jordan/order-now ? Or is simply a 301 from www.airjordan.com to www.nike.com/air-jordan sufficient? Cheers!
Intermediate & Advanced SEO | | CFCU0 -
International Site Migration
Hi guys, In the process of launching internationally ecommerce site (Magento CMS) for two different countries (Australia and US). Then later on expand to other countries like the UK, Canada, etc. The plan is for each country will have its own sub-folder e.g. www.domain.com/us, www.domain.com.au/au, www.domain.com.au/uk A lot of the content between these English based countries are the same. E.g. same product descriptions.
Intermediate & Advanced SEO | | jayoliverwright
So in order to prevent duplication, from what I’ve read we will need to add Hreflang tags to every single page on the site? So for: Australian pages: United States pages: Just wanted to make sure this is the correct strategy (will hreflang prevent duplicate content issues?) and anything else i should be considering? Thankyou, Chris0 -
International Sitemaps
Hey Dudes, Quick question about international sitemaps. Basically we have a mix of subfolders, subdirectories, and ccTLDs for our different international/language sites. With this in mind how do you recommend we set up the site map. I'm thinking the best solution would be to move the subfolders and subdirectories onto an index and put the ccTLD site maps on their own root only. domain.ca/sitemap (This would only contain the Canada pages) domain.com, fr.domain.com, domain.com/eu/ (These pages would all have an index on domain.com/sitemap that points to each language/nations index) OR Should all site have a site map under their area. domain.com/sitemap, fr.domain.com/sitemap, domain.com/eu/sitemap, domain.ca/sitemap? I'm very new to international SEO. I know that our current structure probably isn't ideal... but it's what I've inherited. I just want to make sure I get a good foundation going here. So any tips are much appreciated!
Intermediate & Advanced SEO | | blake.runyon0 -
Seo for international sites
Hello, I have a question for the group, our main US site- http://www.datacard.com is utilized to move content to other regional sites like http://www.datacard.co.uk/ and http://www.datacard.fr/ and http://www.datacard.com.br/. Anyhow, we essentially have some regional content on those sites, but for ease of maintaining and updating the content we have a company translate this for us and then undergo an in country review for local people in our company to review the content. That being said the meta descriptions, titles, code, everything gets translated to that language. I know there are issue for SEO for these purposes as we get much better rankings with http://www.datacard.com. The regional sites are newer so this could be part of it. We don't have an agency helping us with SEo and i get a lot of questions on what can be done internally for this for regional sites with our current structure. Any tips you have? It would be greatly appreciated! Laura
Intermediate & Advanced SEO | | lauramrobinson320 -
Any Suggestions For My Site?
I've recently started a website that is based on movie posters. The site has fundamentally been built for users and not SEO but I'm wondering if anyone can see any problems or just general advice that may help with our SEO efforts? The "content" on the website are the movie posters. I know Google likes text content, but I don't see what else we could add that wouldn't be purely for SEO. My site is: http://www.bit.ly/ZSPbTA
Intermediate & Advanced SEO | | whispertera0 -
Ask a Question
We use DNN and we have case studies ran from our CMS. This is so we can have them in lists by category on service/market pages and show specific ones when needed. Then there is the case study detail page, (this is where the problem exists)to where you read out the case study in full detail and see the images and story. We enter our Case Studies into the CMS and this determines which website they show, and it creates URLs from the titles. However, on the detail page, the case studies all share the same page, Case Study.aspx, and they resolve to that page with their respected URLs in place. As seen here, http://www.structural.net/case-study/1/new-marlins-stadium.aspx Because they all share the same page they are being pulled as duplicate pages. They do show in the SERPS with the right title and URL and it all looks great, but they get errors for having duplicate page content and titles. Is there a way to solve this, or is this something I should even worry about?
Intermediate & Advanced SEO | | KJ-Rodgers0 -
Xml sitemap advice for website with over 100,000 articles
Hi, I have read numerous articles that support submitting multiple XML sitemaps for websites that have thousands of articles... in our case we have over 100,000. So, I was thinking I should submit one sitemap for each news category. My question is how many page levels should each sitemap instruct the spiders to go? Would it not be enough to just submit the top level URL for each category and then let the spiders follow the rest of the links organically? So, if I have 12 categories the total number of URL´s will be 12??? If this is true, how do you suggest handling or home page, where the latest articles are displayed regardless of their category... so I.E. the spiders will find l links to a given article both on the home page and in the category it belongs to. We are using canonical tags. Thanks, Jarrett
Intermediate & Advanced SEO | | jarrett.mackay0