Will using http ping, lastmod increase our indexation with Google?
-
If Google knows about our sitemaps and they’re being crawled on a daily basis, why should we use the http ping and /or list the index files in our robots.txt?
- Is there a benefit (i.e. improving indexability) to using both ping and listing index files in robots?
- Is there any benefit to listing the index sitemaps in robots if we’re pinging?
- If we provide a decent <lastmod>date is there going to be any difference in indexing rates between ping and the normal crawl that they do today?</lastmod>
- Do we need to all to cover our bases?
thanks
Marika
-
Will using http ping, lastmod increase our indexation with Google?
No. You can submit a perfect sitemap and ping Google with changes every hour, but that will not increase the number of pages which are indexed.
A few good sources discussing sitemaps and indexing:
http://followmattcutts.com/2010/03/23/matt-cutts-on-sitemap-indexing/
http://faq.bloggertipsandtricks.com/2010/08/html-xml-sitemap-what-difference-matt.html
If you have a site with solid navigation, good architecture and links, then there is no need to use a sitemap. Search engines will determine how often your site should be crawled based on your site's authority. They can also determine which pages have been modified by comparing the header dates with their database.
I still use a sitemap, but it's mostly because the process is fully automated. I know of other sites that are well indexed which do not use site maps at all.
With the above understood, I'll try to offer a bit more information directly related to your questions. When you ask about pinging, I presume you are referring to mainly Google and Bing. For those cases, the answers to all four of your questions is NO.
Listing your sitemap location in robots.txt will help other search engines whom you did not ping to locate your sitemap. This can include the SEOmoz crawler, for example.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site indexed by Google, but (almost) never gets impressions
Hi there, I have a question that I wasn't able to give it a reasonable answer yet, so I'm going to trust on all of you. Basically a site has all its pages indexed by Google (I verified with site:sitename.com) and it also has great and unique content. All on-page grades are A with absolutely no negative factors at all. However its pages do not get impressions almost at all. Of course I didn't expect it to be on page 1 since it has been launched on Dec, 1st, but it looks like Google is ignoring (or giving it bad scores) for some reason. Only things that can contribute to that could be: domain privacy on the domain, redirect from the www to the subdomain we use (we did this because it will be a multi-language site, so we'll assign to each country a subdomain), recency (it has been put online on Dec 1st and the domain is just a couple of months old). Or maybe because we blocked crawlers for a few days before the launch? Exactly a few days before Dec 1st. What do you think? What could be the reason for that? Thanks guys!
Technical SEO | | ruggero0 -
Unnecessary pages getting indexed in Google for my blog
I have a blog dapazze.com and I am suffering from a problem for a long time. I found out that Google have indexed hundreds of replytocom links and images attachment pages for my blog. I had to remove these pages manually using the URL removal tool. I had used "Disallow: ?replytocom" in my robots.txt, but Google disobeyed it. After that, I removed the parameter from my blog completely using the SEO by Yoast plugin. But now I see that Google has again started indexing these links even after they are not present in my blog (I use #comment). Google have also indexed many of my admin and plugin pages, whereas they are disallowed in my robots.txt file. Have a look at my robots.txt file here: http://dapazze.com/robots.txt Please help me out to solve this problem permanently?
Technical SEO | | rahulchowdhury0 -
I was googling the word "best web hosting" and i notice the 1st and 3rd result were results with google plus. Does Google plus now play a role in improving ranking for the website?
I was googling the word "best web hosting" and i notice the 1st and 3rd result were results with google plus. Does Google plus now play a role in improving ranking for the website?I see a person's name next to the website too
Technical SEO | | mainguy0 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
Will I still get Duplicate Meta Data Errors with the correct use of the rel="next" and rel="prev" tags?
Hi Guys, One of our sites has an extensive number category page lsitings, so we implemented the rel="next" and rel="prev" tags for these pages (as suggested by Google below), However, we still see duplicate meta data errors in SEOMoz crawl reports and also in Google webmaster tools. Does the SEOMoz crawl tool test for the correct use of rel="next" and "prev" tags and not list meta data errors, if the tags are correctly implemented? Or, is it necessary to still use unique meta titles and meta descriptions on every page, even though we are using the rel="next" and "prev" tags, as recommended by Google? Thanks, George Implementing rel=”next” and rel=”prev” If you prefer option 3 (above) for your site, let’s get started! Let’s say you have content paginated into the URLs: http://www.example.com/article?story=abc&page=1
Technical SEO | | gkgrant
http://www.example.com/article?story=abc&page=2
http://www.example.com/article?story=abc&page=3
http://www.example.com/article?story=abc&page=4 On the first page, http://www.example.com/article?story=abc&page=1, you’d include in the section: On the second page, http://www.example.com/article?story=abc&page=2: On the third page, http://www.example.com/article?story=abc&page=3: And on the last page, http://www.example.com/article?story=abc&page=4: A few points to mention: The first page only contains rel=”next” and no rel=”prev” markup. Pages two to the second-to-last page should be doubly-linked with both rel=”next” and rel=”prev” markup. The last page only contains markup for rel=”prev”, not rel=”next”. rel=”next” and rel=”prev” values can be either relative or absolute URLs (as allowed by the tag). And, if you include a <base> link in your document, relative paths will resolve according to the base URL. rel=”next” and rel=”prev” only need to be declared within the section, not within the document . We allow rel=”previous” as a syntactic variant of rel=”prev” links. rel="next" and rel="previous" on the one hand and rel="canonical" on the other constitute independent concepts. Both declarations can be included in the same page. For example, http://www.example.com/article?story=abc&page=2&sessionid=123 may contain: rel=”prev” and rel=”next” act as hints to Google, not absolute directives. When implemented incorrectly, such as omitting an expected rel="prev" or rel="next" designation in the series, we'll continue to index the page(s), and rely on our own heuristics to understand your content.0 -
How do you know what version of your site of Google is in their index?
This is going to sound like a strange question, but I am trying to understand which version of our site is in the index. You might think this is an obvious question, but here is why I am asking: 1. Today I searched for a specific keyword and found the listing. 2. I liked on the right arrow next to the listing and checked the cache date. It says 6/28 and shows the site as of 6/28. 3. I expected to see that we were just indexed as we jumped several pages since yesterday and I had just checked two days ago and we hadn't moved at all. It seems like Google may have taken the changes we made on 7/2 but since it is showing 6/28, I am note sure. Since this is confusing, here is the chronology: 1. Made changes 6/20. 2. Site appeared to be indexed on 6/28. 3. Made changes on 7/2. 4. Checked the site on 7/2 and we were in position 60. Checked the site on 7/4 and we were in position 61. 5.. Checked the site today (7/6) and see we are in position 8. The cache date shows as 6/28. I suspect that Google just indexed us yesterday and is reflecting the changes I made on 7/2. But the fact that it says it was cached on 6/28 seems to sugges otherwise. I want to be sure I know which version got us the good rankings - is there any way to be sure? Thanks!!
Technical SEO | | trophycentraltrophiesandawards0 -
Homepage/Root domain de-indexed by Google
This morning I discovered that the homepage/root domain of our company site, http://www.collegeplus.org/, has been de-indexed by Google and Bing. Out IT dept. is claiming it's our fault because we changed the meta title on our homepage. But they will not give me access to GWT to see if there's any issues. I believe the issue lies within our robots.txt file - http://www.collegeplus.org/robots.txt I also don't believe we're suffering a penalty because all of our tier 2 pages are still indexed when any type of branded search is performed. We don't do things that can get a site de-indexed like this. Any ideas on what the issue may be? Or at least something to convince our IT dept. that simply changing a meta title won't get your homepage totally de-indexed? Thanks.
Technical SEO | | explorionary0 -
Google crawl index issue with our website...
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server. In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder. An example: Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky. We called our hosting company who said the problem isn't coming from them... Has anyone had something like this happen to them? Thanks so much for your insight!
Technical SEO | | ILM_Marketing
Megan0