Kill your htaccess file, take the risk to learn a little
-
Last week I was browsing Google's index with "site:www.mydomain.com and wanted to scan over to see what Google had indexed with my site. I came across a URL that was mistakenly indexed. It went something like this
www.mydomain.com/link1/link2/link1/link4/link3
I didn't understand why Google had indexed a page like that of mine when the "link" pages were links that were on my main bar which were site wide links. It seemed to be looping infinitely over and over. So I started trying to see how many of these Google had indexed and I came across about 20 pages. I went through the process of removing the URL's in Webmaster Tools, but then I wanted to know why it was happening. I had discovered that I had mistakenly placed some links on my site in my header in such a manner
If you know HTML you will realize that by not placing the "/" in the front of the link I was telling that page to add that link in addition to the URL that is was currently on. What this did was create an infinite loop of links which is not good
Basically when Google went to www.mydomain.com/link1/ it found the other links which then told Google to add that url to the existing URL and then go to that link.
Something like: www.mydomain.com/links1/link2/...
When you do not add the "/" in front of the directory you are linking too it will do this. The "/" refers to the root so if you place that in front of your directory you are linking too it will always assume that first "/" as the root then the url will follow.
So what did I do?
Even though I was able to find about 20 URL's using the "site:" search method there had to be more out there. Even though I tried to search I was not able to find anymore, but I was not convinced.
The light bulb went on at this point
My .htaccess file contained many 301 redirects in my attempt to try and redirect those pages to a real page, they were not really relevant pages to redirect too. So how could I really find out what Google had indexed out there for me since Webmaster Tools only reports the top 1000 links.
I decided to kill my htaccess file. Knowing that Google is "forgiving" when major changes to your site happen I knew Google would not simply just kill my site for removing my htaccess file immediately.
I waited 3 days then BOOM! Webmaster Tools was reporting to me that it found a ton of 401's on my site. I looked at the Crawl Errors and there they were. All those infinite loop links that I knew had to be more out there, I was able to see.
How many were there?
Google found in the first crawl over 5,000 of them. OMG! Yeah could you imagine the "Low quality" score I was getting on those pages? By seeing all those links I was able to determine about 4 patterns in the links. For example:
Now my issue was I wanted to keep all the URL's that were pointing to www.mydomain.com/link1 but anything after that I needed gone. I went into my Robots.txt file and added this
Disallow: www.mydomain.com/link1/link2/
Disallow: www.mydomain.com/link1/link3/
Disallow: www.mydomain.com/link1/link4/
Disallow: www.mydomain.com/link1/link5/
Now there were many more pages indexed that went deeper into those links but I knew I wanted anything after the 2nd URL gone since it was the start of the loop that I detected. With that I was able to have from what I know at least 5k links if not more.
What did I learn from this?
Kill your htaccess file for a few days and see what comes back in your reports. You might learn something
After doing this I simply replaced my htaccess file and I am on my way to removing a ton of "low quality" links I didn't even know I had.
-
Interesting post. Yeah, the .htaccess file is the most important file out there and it is easy to mess up (as I am sure most everyone has at one time or another).
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting Google to index a large PDF file
Hello! We have a 100+ MB PDF with multiple pages that we want Google to fully index on our server/website. First of all, is it even possible for Google to index a PDF file of this size? It's been up on our server for a few days, and my colleague did a Googlebot fetch via Webmaster Tools, but it still hasn't happened yet. My theories as to why this may not work: A) We have no actual link(s) to the pdf anywhere on our website. B) This PDF is approx 130 MB and very slow to load. I added some compression to it, but that only got it down to 105 MB. Any tips or suggestions on getting this thing indexed in Google would be appreciated. Thanks!
Technical SEO | | BBEXNinja0 -
Take a good amount of existing landing pages offline because of low traffic, cannibalism and thin content
Hello Guys, I decided to take of about 20% of my existing landing pages offline (of about 50 from 250, which were launched of about 8 months ago). Reasons are: These pages sent no organic traffic at all in this 8 months Often really similiar landing pages exist (just minor keyword targeting difference and I would call it "thin" content) Moreover I had some Panda Issues in Oct, basically I ranked with multiple landing pages for the same keyword in the top ten and in Oct many of these pages dropped out of the top 50.. I also realized that for some keywords the landing page dropped out of the top 50, another landing page climbed from 50 to top 10 in the same week, next week the new landing page dropped to 30, next week out of 50 and the old landing pages comming back to top 20 - but not to top ten...This all happened in October..Did anyone observe such things as well? That are the reasons why I came to the conclustion to take these pages offline and integrating some of the good content on the other similiar pages to target broader with one page instead of two. And I hope to benefit from this with my left landing pages. I hope all agree? Now to the real question: Should I redirect all pages I take offline? Basically they send no traffic at all and non of them should have external links so I will not give away any link juice. Or should I just remove the URL's in the google webmaster tools and take them then offline? Like I said the sites are basically dead and personally I see no reason for these 50 redirects. Cheers, Heiko
Technical SEO | | _Heiko_0 -
Submitting a new sitemap index file. Only one file is getting read. What is the error?
Hi community, I am working to submit a new a new sitemap index files, where about 5 50,000 sku files will be uploaded. Webmasters is reporting that only 50k skus have been submitted. Google Webmasters is accepting the index, however only the first file is getting read. I have 2 errors and need to know if this is the reason that the multiple files are not getting uploaded. Errors: | 1 | | Warnings | Invalid XML: too many tags | Too many tags describing this tag. Please fix it and resubmi | | 2 | | Warnings | Incorrect namespace | Your Sitemap or Sitemap index file doesn't properly declare the namespace. | 1 | Here is the url I am submitting: http://www.westmarine.com/sitemap/wm-sitemap-index.xml | 1 | | | | |
Technical SEO | | mm9161570 -
How long does it take to reindex the website
Generally speaking, how long does it take for Google to recrawl/reindex an (ecommerce) website? After changing a number of product subcategories from 'noindex' back to 'index', I regenerated the sitemap and have fetched as Google in WMT. This was a couple of weeks ago and no action yet. Second question: Does Google treat these pages as if they're brand new? I 'noindexed' them back in April, and they were ranking ok then. (I had noindexed them on the back of advice from my SEO, due to concerns about these pages being seen as duplicate content). Help!
Technical SEO | | Coraltoes770 -
Risk of hosting website on internal application server
This isn't really related to SEO, but it came up with a client.... We have a client that is hosting their website internally. That's not a problem except that the server it is hosted on is the same server where they host many secure applications with sensitive customer information. We recommended to the marketing vp to move the website to a different server that doesn't host secure apps so we don't have be concerned about any conflicts. Their IT folks pushed back. So, I'm looking for some ammunition to support our recommendation. Any ideas? Thanks. Bernie Borges Find and Convert [email protected]
Technical SEO | | BernieBorges0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
Problems with my Site? (If you have time take a look :P thanks)
Hey! If anyone has a moment I would really appreciate any tips or problems you see about our website. I don't expect much but would appreciate any suggestions and such. We are working on back linking and content generation but I know that may not be much use if the website itself is not built well enough! Thank you to anyone who takes a moment of their time to take a look! Site: http://earthsaverequipment.com I am more interested in SEO issues or suggestions not comments that you dislike my artwork 😛 haha Cheers Charles
Technical SEO | | WebNooby0 -
Htaccess rewrites
We’re using wordpress, and we have the following in the .htaccess: BEGIN WordPress RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] END WordPress We have a URL like http://subdomain.example.com/select-a-card/?q=fuel, and I want to rewrite it so that it becomes http://subdomain.example.com/fuel-card/ What do I need to add to the htaccess to do this?? Thanks,
Technical SEO | | AndrewAkesson0