How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I apply Canonical Links from my Landing Pages to Core Website Pages?
I am working on an SEO project for the website: https://wave.com.au/ There are some core website pages, which we want to target for organic traffic, like this one: https://wave.com.au/doctors/medical-specialties/anaesthetist-jobs/ Then we have basically have another version that is set up as a landing page and used for CPC campaigns. https://wave.com.au/anaesthetists/ Essentially, my question is should I apply canonical links from the landing page versions to the core website pages (especially if I know they are only utilising them for CPC campaigns) so as to push link equity/juice across? Here is the GA data from January 1 - April 30, 2019 (Behavior > Site Content > All Pages😞
Intermediate & Advanced SEO | | Wavelength_International0 -
Blog posts and 3rd hierarchy pages weigh same as per Google?
Our blogs.website.com has been a sub-directory since 2 years as website.com/blog. Now we have blog posts with URLs website.com/blog/blog-post-1. This blog is located at different place technically, away from website. Now my doubt is whether the blog-posts have equal weightage at Google just like other 3rd hierarchy level pages of website like website.com/page1/topic2. Thanks
Intermediate & Advanced SEO | | vtmoz1 -
Google crawling 200 page site thousands of times/day. Why?
Hello all, I'm looking at something a bit wonky for one of the websites I manage. It's similar enough to other websites I manage (built on a template) that I'm surprised to see this issue occurring. The xml sitemap submitted shows Google there are 229 pages on the site. Starting in the beginning of December Google really ramped up their intensity in crawling the site. At its high point Google crawled 13,359 pages in a single day. I mentioned I manage other similar sites - this is a very unusual spike. There are no resources like infinite scroll that auto generates content and would cause Google some grief. So follow up questions to my "why?" is "how is this affecting my SEO efforts?" and "what do I do about it?". I've never encountered this before, but I think limiting my crawl budget would be treating the symptom instead of finding the cure. Any advice is appreciated. Thanks! *edited for grammar.
Intermediate & Advanced SEO | | brettmandoes0 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
Google + pages and SEO results...
Hi, Can anyone give me insight into how people are getting away with naming their business by the SEO search term, creating a BS Google + page, then having that page rank high in the search results. I am speaking specifically about the results you get when you Google: "Los Angeles DUI Lawyer". As you can see from my attached screenshot (I'm doing the search in Los Angeles), the FIRST listing is a Google + business. Strangely, the phone number listed doesn't actually take you to a DUI attorney, but rather to some marketing group that never answers the phone. Can anyone give me insight into why Google even allows this? I just find it odd that Google cares so much about the user experience, but have the first result be something completely misleading. I know it sounds like I'm just jealous (which I am, a little), but I find it disheartening that we work so hard on SEO, and someone takes the top spot with an obvious BS page. UupqBU9
Intermediate & Advanced SEO | | mrodriguez14400 -
Is Sitemap Issue Causing Duplicate Content & Unindexed Pages on Google?
On July 10th my site was migrated from Drupal to Google. The site contains approximately 400 pages. 301 permanent redirects were used. The site contains maybe 50 pages of new content. Many of the new pages have not been indexed and many pages show as duplicate content. Is it possible that there is a site map issue that is causing this problem? My developer believes the map is formatted correctly, but I am not convinced. The sitemap address is http://www.nyc-officespace-leader.com/page-sitemap.xml [^] I am completely non technical so if anyone could take a brief look I would appreciate it immensely. Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan | |0 -
Merging your google places page with google plus page.
I have a map listing showing for the keyword junk cars for cash nj. I recently created a new g+ page and requested a merge between the places and the + page. now when you do a search you see the following. Junk Cars For Cash NJ LLC
Intermediate & Advanced SEO | | junkcars
junkcarforcashnj.com/
Google+ page - Google+ page the first hyperlink takes me to the about page of the G+ and the second link takes me to the posts section within g+. Is this normal? should i delete the places account where the listing was originally created? Or do i leave it as is? Thanks0 -
Google is ranking the wrong page for the targeted keyword
I have two examples below where we want it to rank for the targeted page but google picked another page to rank instead. This is happening a lot on this site I just recently started to work on. Example 1 Googles Choice for key word Motorcycle Tires: http://www.rockymountainatvmc.com/cl/50/Tires-and-Wheels What we want Google to choice for Motorcycle Tires: http://www.rockymountainatvmc.com/c/49/-/181/Motorcycle-Tires Other pages about Motorcycle tires: http://www.rockymountainatvmc.com/d/12/Motorcycle-Tires We even used the rel="canonical" for this url to point to our target page. http://www.rockymountainatvmc.com/c/50/-/181/Motorcycle-Tires Example 2 ATV Tires We want this page to rank http://www.rockymountainatvmc.com/c/43/81/165/ATV-Tires however google has decided to rank http://www.rockymountainatvmc.com/t/43/81/165/723/ATV-Tires-All that is acutally one folder under where we want it.
Intermediate & Advanced SEO | | DoRM0