Indexed Pages in Google, How do I find Out?
-
Is there a way to get a list of pages that google has indexed?
Is there some software that can do this?
I do not have access to webmaster tools, so hoping there is another way to do this.
Would be great if I could also see if the indexed page is a 404 or other
Thanks for your help, sorry if its basic question
-
If you want to find all your indexed pages in Google just type: site:yourdomain.com or .co.uk or other without the www.
-
Hi John,
Hope I'm not too late to the party! When checking URL's for their cache status I suggest using Scrapebox (with proxies).
Be warned, it was created as a black-hat tool, and as such is frowned upon, but there are a number of excellent white-hat uses for it! Costs $57 one off
-
sorry to keep sending you messages but I wanted to make sure that you know SEOmoz does have a fantastic tool for what you are requesting. Please look at this link and then click on the bottom where it should says show more and I believe you will agree it does everything you've asked and more.
http://pro.seomoz.org/tools/crawl-test
Sincerely,
Thomas
does this answer your question?
-
What giving you a 100 limit?
try using Raven tools or spider mate they both have excellent free trials and allow you quite a bit of information.
-
Neil you are correct I agree with screaming frog is excellent they definitely will show you your site. Here is a link from SEOmoz associate that I believe will benefit you
http://www.seomoz.org/q/404-error-but-i-can-t-find-any-broken-links-on-the-referrer-pages
sincerely,
Thomas
-
this is what I am looking for Thanks
Strange that there is no tool I can buy to do this in full without the 100 limit
Anyway, i will give that a go
-
can I get your sites URL? By the way this might be a better way into Google Webmaster tools
if you have a Gmail account use that if you don't just sign up using your regular e-mail.
Of course using SEOmoz via http://pro.seomoz.org/tools/crawl-test will give you a full rundown of all of your links and how they're running. Are you not seen all of them?
Another tool I have found very useful. Is website analysis as well as their midsize product from Alexia
I hope I have helped,
Tom
-
If you don't have access to Webmaster Tools, the most basic way to see which pages Google has indexed is obviously to do a site: search on Google itself - like "site:google.com" - to return pages of SERPs containing the pages from your site which Google has indexed.
Problem is, how do you get the data from those SERPs in a useful format to run through Screaming Frog or similar?
Enter Chris Le's Google Scraper for Google Docs
It will let scrape the first 100 results, then let you offset your search by 100 and get the next 100, etc.. slightly cumbersome, but it will achieve what you want to do.
Then you can crawl the URLs using Screaming Frog or another crawler.
-
just thought I might add these links these might help explain it better than I did.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1352276
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=2409443&topic=2446029&ctx=topic
http://pro.seomoz.org/tools/crawl-test
you should definitely sign up for Google Webmaster tools it is free here is a link all you need to do is add an e-mail address and password
http://support.google.com/webmasters/bin/topic.py?hl=en&topic=1724121
I hope I have been of help to you sincerely,
Thomas
-
Thanks for the reply.
I do not have access to webmaster tools and the seomoz tools do not show a great deal of the pages on my site for some reason
Majestic shows up to 100 pages. Ahrefs shows some also.
I need to compare what google has indexed and the status of the page
Does screaming frog do thiss?
-
Google Webmaster tools should supply you with this information. In addition Seomoz tools will tell you that and more. Run your website through the campaign section of seomoz you will then see any issues with your website.
You may also want to of course use Google Webmaster tools run a test as a Google bot the Google but should show you any issues you are having such is 404's or other fun things that websites do.
If you're running WordPress there are plenty of plug-ins I recommend 404 returned
sincerely,
Thomas
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is Google able to see child pages in our AJAX pagination?
We upgraded our site to a new platform the first week of August. The product listing pages have a canonical issue. Page 2 of the paginated series has a canonical pointing to page 1 of the series. Google lists this as a "mistake" and we're planning on implementing best practice (https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html) We want to implement rel=next,prev. The URLs are constructed using a hashtag and a string of query parameters. You'll notice that these parameters are ¶meter:value vs ¶meter=value. /products#facet:&productBeginIndex:0&orderBy:&pageView:grid&minPrice:&maxPrice:&pageSize:& None of the URLs are included in any indexed URLs because the canonical is the page URL without the AJAX parameters. So these results are expected. Screamingfrog only finds the product links on page 1 and doesn't move to page 2. The link to page 2 is AJAX. ScreamingFrog only crawls AJAX if its in Google's deprecated recommendations as far as I know. The "facet" parameter is noted in search console, but the example URLs are for an unrelated URL that uses the "?facet=" format. None of the other parameters have been added by Google to the console. Other unrelated parameters from the new site are in the console. When using the fetch as Google tool, Google ignores everything after the "#" and shows only the main URL. I tested to see if it was just pulling the canonical of the page for the test, but that was not the case. None of the "#facet" strings appear in the Moz crawl I don't think Google is reading the "productBeginIndex" to specify the start of a page 2 and so on. One thought is to add the parameter in search console, remove the canonical, and test one category to see how Google treats the pages. Making the URLs SEO friendly (/page2.../page3) is a heavy lift. Any ideas how to diagnose/solve this issue?
Intermediate & Advanced SEO | | Jason.Capshaw0 -
Finding Cause of Google Demotion (second time around!)
Our website, christnotes.org has historically ranked very well in it's space. We have always been in top 3 positions for daily bible verse related searches. There have been no fluctuations in rankings until it took a hit around September 4th through October 14th with approximately 35% drop in PVs and over 60% drop in traffic from Google. The site fully recovered google traffic mid-Oct. On November 24th the site was once again hit, this time with a 50% drop in pageviews and over 75% drop in traffic from google. Google Analytics Image depicting the two drops attached. When the first drop hit, we checked everything - bad links, broken URLs, page speed, etc. There was a slight increase in page speed so we did a little tweaking and made some improvements (8.36 second page load to 5.5) This time around, I can find no cause and no areas that need fixed to recover our rankings and traffic. Very confused on Google dropping rank then recovering after what looks like a page speed fix and then dropping again a month later. Any suggestions???? KGOgzEm
Intermediate & Advanced SEO | | KristieWahlquist0 -
Town and County pages taking months to index.
Hi, At http://www.general-hypnotherapy-register.com/regional-hypnotherapy-directory/ we have a load of town and county pages for all of the hypnotherapists on the site a) I have checked all of these links and they are spiderable. b) About a month back I noticed after the site changes, not entirely sure why, but the site was generating rogue pages, eg http://www.general-hypnotherapy-register.com/hypnotherapists/page/5/?town=barnsley instead of http://www.general-hypnotherapy-register.com/hypnotherapists/?town=barnsley We have added meta no index, no follow to these rogue pages around 4 weeks ago..however these pages still have a google cache date of Oct 4th predating these meta changes c) There are examples of the pages we do want, indexed, and ranking too on page 1, site:www.general-hypnotherapy-register.com/hypnotherapists eg http://www.general-hypnotherapy-register.com/hypnotherapists/?town=ockham however these pages are few and far between, these have a recent google cache date of Nov 1 **d) **The xml sitemap has all of the correct URLS, but in webmaster tools, the amount of pages indexed has been stubbornly flat at 2800 out of 4400 for 4 weeks now e) Query Paramaters: for ?town and ?county in webmaster tools, are set to Yes/Specifies Would love any suggestions, Thanks. Mark.
Intermediate & Advanced SEO | | Advantec0 -
How do I find the links on my site that link to another one of my pages?
I ran IIS Seo toolkit and it found about 40 pages that I have no idea how they exist. What tool can I use to find out what internal link is linking to them so I can fix them or get rid of them?
Intermediate & Advanced SEO | | EcommerceSite0 -
Previously ranking #1 in google, web page has 301 / url rewrite, indexed but now showing for keyword search?
Two web pages on my website, previously ranked well in google, consistent top 3 places for 6months+, but when the site was modified, these two pages previously ending .php had the page names changed to the keyword to further improve (or so I thought). Since then the page doesn't rank at all for that search term in google. I used google webmaster tools to remove the previous page from Cache and search results, re submitted a sitemap, and where possible fixed links to the new page from other sites. On previous advice to fix I purchased links, web directories, social and articles etc to the new page but so far nothing... Its been almost 5 months and its very frustrating as these two pages previously ranked well and as a landing page ended in conversions. This problem is only appearing in google. The pages still rank well in Bing and Yahoo. Google has got the page indexed if I do a search by the url, but the page never shows under any search term it should, despite being heavily optimised for certain terms. I've spoke to my developers and they are stumped also, they've now added this text to the effected page(s) to see if this helps. Header("HTTP/1.1 301 Moved Permanently");
Intermediate & Advanced SEO | | seanclc
$newurl=SITE_URL.$seo;
Header("Location:$newurl"); Can Google still index a web page but refuse to show it in search results? All other pages on my site rank well, just these two that were once called something different has caused issues? Any advice? Any ideas, Have I missed something? Im at a loss...0 -
Do you bother cleaning duplicate content from Googles Index?
Hi, I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do... For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'. Do you think this is right? Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled? Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time? Thanks FashionLux
Intermediate & Advanced SEO | | FashionLux0 -
How long until Sitemap pages index
I recently submitted an XML sitemap on Webmaster tools: http://www.uncommongoods.com/sitemap.xml Once Webmaster tools downloads it, how long do you typically have to wait until the pages index ?
Intermediate & Advanced SEO | | znotes0 -
How do I index these parameter generated pages?
Hey guys, I've got an issue with a site I'm working on. A big chunk of the content (roughly 500 pages) is delivered using parameters on a dynamically generated page. For example: www.domain.com/specs/product?=example - where "example' is the product name Currently there is no way to get to these pages unless you enter the product name into the search box and access it from there. Correct me if I'm wrong, but unless we find some other way to link to these pages they're basically invisible to search engines, right? What I'm struggling with is a method to get them indexed without doing something like creating a directory map type page of all of the links on it, which I guess wouldn't be a terrible idea as long as it was done well. I've not encountered a situation like this before. Does anyone have any recommendations?
Intermediate & Advanced SEO | | CodyWheeler0