How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Too many SEO changes needed on a page. Create a new page?
I've been doing some research on a keyword with Page Optimization. I'm finding there's a lot of changes suggested. I'm wondering that because of the amount of changes required is it better to create a new page entirely from scratch that has all the suggestions implemented OR change the current page? Thanks, Chris
Intermediate & Advanced SEO | | Chris29181 -
I think Google Analytics is mis-reporting organic landing pages.
I have multiple clients whose Google Analytics accounts are showing me that some of the top performing organic landing pages (in terms of highest conversion rates) look like this: /cart.php /quote /checkout.php /finishorder.php /login.php In some cases, these pages are blocked by Robots.txt. In other cases they are not even indexed at all in Google. These pages are clearly part of the conversion process. A couple of them are links sent out when a cart is abandoned, etc. - is it possible they actually came in organically but then re-entered via one of these links which is what Google is calling the organic landing page? How is it possible that these pages would be the top performing landing pages for organic visitors?
Intermediate & Advanced SEO | | FPD_NYC0 -
Thousands of Web Pages Disappered from Google Index
The site is - http://shop.riversideexports.com We checked webmaster tools, nothing strange. Then we manually resubmitted using webmaster tools about a month ago. Now only seeing about 15 pages indexed. The rest of the sites on our network are heavily indexed and ranking really well. BUT the sites that are using a sub domain are not. Could this be a sub domain issue? If so, how? If not, what is causing this? Please advise. UPDATE: What we can also share is that the site was cleared twice in it's lifetime - all pages deleted and re-generated. The first two times we had full indexing - now this site hovers at 15 results in the index. We have many other sites in the network that have very similar attributes (such as redundant or empty meta) and none have behaved this way. The broader question is how to do we get the indexing back ?
Intermediate & Advanced SEO | | suredone0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Why is our page will not being found by google?
Hi, We have a page that went live nearly 2 months ago. https://www.invoicestudio.com/Secure/InvoiceTemplate Why does google not notice it. Both site: URL's return nothing. site:www.invoicestudio.com/Secure/InvoiceTemplate site:www.invoicestudio.com/Secure This is an important page for us and do not understand why google doesn't like it. Hope you can help Thanks Andrew
Intermediate & Advanced SEO | | Studio330 -
Why is this page not being delivered for Google search result?
Hey folks, Figured I would try to get an experts insight on this. On google search result for BLACK TITANIUM RINGS + TITANIUM-JEWELRY.COM the page that I "think" should show up is this one: http://www.titanium-jewelry.com/black-titanium-rings.html However, it does not. Imho, this page is highly relevant. I used Rank Tracker here on seomoz.org and the page is not even in top 50 of search engine results for google. Our 'About Black Titanium Rings' page ranks #2 (http://www.titanium-jewelry.com/about-black-titanium.html) but the /black-titanium-rings.html page doesn't even rank. Any suggestions on what I could look at to figure out why this page is being penalized? We are not under a manual penalty (anymore!). Thanks! Ron
Intermediate & Advanced SEO | | yatesandcojewelers0 -
Google Ranking Wrong Page
The company I work for started with a website targeting one city. Soon after I started SEO for them, they expanded to two cities. Optimization was challenging, but we managed to rank highly in both cities for our keywords. A year or so later, the company expanded to two new locations, so now 4 total. At the time, we realized it was going to be tough to rank any one page for four different cities, so our new SEO strategy was to break the website into 5 sections or minisites consisting of 4 city-targeted sites, and our original site which will now be branded as more of a national website. Our URL structures now look something like this:
Intermediate & Advanced SEO | | cpapciak
www.company.com
www.company.com/city-1
www.company.com/city-2
www.company.com/city-3
www.company.com.city-4 Now, in the present time, all is going well except for our original targeted city. The problem is that Google keeps ranking our original site (which is now national) instead of the new city-specific site we created. I realize that this is probably due to all of the past SEO we did optimizing for that city. My thoughts are that Google is confused as to which page to actually rank for this city's keyword terms and I was wondering if canonical tags would be a possible solution here, since the pages are about 95% identical. Anyone have any insight? I'd really appreciate it!0 -
On Page vs Off Page - Which Has a Greater Effect on Rankings?
Hi Mozzers, My site will be migrating to a new domain soon, and I am not sure how to spend my time. Should I be optimizing our content for keywords, improving internal linking, and writing new content - or should I be doing link building for our current domain (or the new one)? Is there a certain ratio that determines rankings which can help me prioritize these to-dos?, such as 70:30 in favor of link-building? Thanks for any help you can offer!
Intermediate & Advanced SEO | | Travis-W0