Is there a way for me to automatically download a website's sitemap.xml every month?
-
From now on we want to store all our sitemap.xml over the next years. Its a nice archive to have that allows us to analyse how many pages we have on our website and which ones were removed/redirected.
Any suggestions?
Thanks
-
If you use a MySQL database to store your website data, I think that to do this kind of automatic "archival" work by creating an automatic PHP script would take between 2 to 5 hours work. I don't see why it should take more than that.
If someone tells you that it is going to take more than that, I would be suspicious. Either the programmer is not good enough, or wants to cheat on you. That unfortunately happens more than you think!!
Be sure to ask for a step-by-step description of how they plan to complete the job. If you have doubts, please feel free to ask me, I am a pretty expert PHP programmer. I don't work for others, but just for myself (I built and keep tweaking my own websites virtualsheetmusic.com, musicianspage.com and others with very little help from external programmers).
Good luck!
-
Hi Fabrizo,
How long would it take for a PHP programmer to write this code approximately? Since we would have to outsource this I would like an indication to oversee costs involved.
Thanks!
-
The way I would do it would be to make a simple PHP (or Perl) program that every day, week or month (as you may need it), archives your sitemap.xml on a specific directory on your server, and possibly zip it. As a PHP programmer myself, I can tell you that that's really simply to do. Just ask to a PHP programmer, I am sure it will make it in a couple hours!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results. Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/ Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page. I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed. Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
Technical SEO | | d.bird0 -
Sitemap.xml Site multilang
HI all, I have some questions about multilang sitemap.xml. So, we use the same domain subdirectories with gTLDs example.com/pt-br/
Technical SEO | | mobic
example.com/us/
example.com/es/ How should I do the sitemap.xml in this case? I thought of three alternatives: Should I do a sitemap_index.xml to each lang and make categories for these sitemaps? Examples:
http://www.example.com/pt-br/sitemap_index.xml
http://www.example.com/en/sitemap_index.xml
http://www.example.com/es/sitemap_index.xml Should I do only one sitemap_index.xml covering all categories of all languages ? Examples:
http://www.example.com/sitemap_index.xml
http://www.example.com/pt-br/sitemap_categorias_1.xml
http://www.example.com/es/sitemap_categorias_1.xml
http://www.example.com/us/sitemap_categorias_1.xml Should I do a sitemap setting all multilang? <url><loc>http://www.example.com/us/</loc>
<xhtml:link <br="">rel="alternate"
hreflang="es"
href="http://www.example.com/pt-br/"
/>
<xhtml:link <br="">rel="alternate"
hreflang="us"
href="http://www.example.com/us/"
/>
<xhtml:link <br="">rel="alternate"
hreflang="pt-br"
href="http://www.example.com/pt-br/"
/></xhtml:link></xhtml:link></xhtml:link></url> Thanks for any advice.0 -
Redirecting .edu subdomains to our site or taking the link, what's more valuable?
We have a relationship built through a service we offer to universities to be issued a .edu subdomain that we could redirect to our landing page relevant to that school. The other option is having a link from their website to that same page. My first question is, what would be more valuable? Can you pass domain authority by redirecting a subdomain to a subdirectory in my root domain? Or would simply passing the link equity from a page in their root domain to our page pass enough value? My second question is, if creating a subdomain with a redirect is much more valuable, what is the best process for this? Would we simply have their webmaster create the subdomain for us an have them put a 301 redirect to our page? Is this getting in the greyer hat area? Thanks guys!
Technical SEO | | Dom4410 -
What should I do with a large number of 'pages not found'?
One of my client sites lists millions of products and 100s or 1000s are de-listed from their inventory each month and removed from the site (no longer for sale). What is the best way to handle these pages/URLs from an SEO perspective? There is no place to use a 301. 1. Should we implement 404s for each one and put up with the growing number of 'pages not found' shown in Webmaster Tools? 2. Should we add them to the Robots.txt file? 3. Should we add 'nofollow' into all these pages? Or is there a better solution? Would love some help with this!
Technical SEO | | CuriousCatDigital0 -
What's the correct SEO for a Gallery?
Hi there, I was wondering if anyone was an expert on galleries and using canonical URL's? URL: http://www.tecsew.com/gallery In short I'm doing SEO for a site and it has a large gallery (3000+ images) where each specific image has it's own page and each category (there's 200+) also has its own page. Now, what I'm thinking is that this should be reduced and asking Google to index/rank each page is wrong (I also think this because the quality of the pages are relatively low i.e little text & content etc) Therefore, what should be suggested/done to the gallery? Should just the main gallery categories get indexed (i.e http://www.tecsew.com/3d-cad-showcase)? Or should I continue to allow Google to trawl through all of it? Or should canonical URL's be used? Any help would be greatly appreciated. Best Wishes, Charlie S
Technical SEO | | media.street0 -
What is the best approach to specifying a page's language?
I have read about a number of different tags that can accomplish this so it is very confusing. For example, should I be using: OR
Technical SEO | | BlueLinkERP0 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0 -
Does removing product listings help raise SERP's on other pages?
Does removing content ever make sense? We have out of stock products that are left on the site (in an out of stock section) specifically for SEO value, but I am not sure how to approach the problem from a bottom line conversion stand point. Do we leave out of stock products and hope that they turn into a conversion rate via cross selling, or do out of stock products lower the value of other pages by "stealing" link juice and pagerank from the rest of the site? (and effectively driving interest away) What is your perspective? Do you believe that any content that is related or semi-related to your main focus is beneficial, or does it only make sense to have strong content that has a higher rate of conversion and overall site engagement?
Technical SEO | | 13375auc30