How to Hide Directories in Search?
-
I noticed bad 404 error links in Google Webmaster Tools and they were pointing to directories that do not have an actual page, but hold information.
Ex: there are links pointing to our PDF folder which holds all of our pdf documents. If i type in , example.com/pdf/ it brings up a unformated webpage that displays all of our PDF links.
How do I prevent this from happening. Right now I am blocking these in my robots.txt file, but if i type them in, they still appear.
Or should I not worry about this?
-
Yes, a visit to example.com/dir should now return a 404 error (if you haven't done any redirecting/canonicalizing). This will increase your 404 count in Web Master tools but it's far preferable to the alternative. If you're not redirecting the robots.txt will eventually work and hopefully the links will just fall out of WMT.
-
My hosting company turned off directory browsing and now everything is how it should be. So to my understanding, if the server sees a file that does not have a index file, it should not be view able and should be forbidden. This shoujld not affect us from an SEO standpoint should it? My hosting company said they disabled all directories in our site, however everything still works, except for the forbidden file directories.
-
Basically it shouldn't really have an affect; those unformatted file listings are literally the web server automatically saying 'here's the files that are in this folder', there's no meta tags, description, on page elements, etc.
If you have these pages and they're ranking well, you generally don't want them to be. The automatic file browsing pages don't have your name, your company, etc. in them, and they're generally pretty ugly. They also theoretically could be 'stealing' juice from your 'real' pages, if your internal structure isn't flowing relevance properly.
Basically what I'm saying is that if these pages are having some kind of SEO effect, you probably don't want them to be since they're so basic.
Also I can't overstate the security concerns that directory browsing might be introducing. If someone can directory browse to where your code lives (.php, .aspx.vb, whatever) they may be able to read it. Code sometimes has important things like logins, passwords, merchant account ids, etc. in it that you definitely don't want people reading.
-
Agreed with Valerie that step 1 is to turn off those directory listing pages - that can be a security issue and you don't necessarily want people to see/access the whole list. Also, make doubly sure you don't have any internal links to that directory (Google crawled it somehow).
Generally, Robots.txt should prevent crawling, but it's not foolproof, and it's pretty bad about removing pages once they're indexed. If you can block the page from browsing and return a 404 for the root page, that should be fine. The other option would be to have the page removed in Google Webmaster Tools. You could request removal for the entire folder, but I'm guessing that you may want the actual PDFs indexed.
-
Will turning of directory browsing affect Search for all directories?
-
I really don't want to 301 redirect them as they are just holding files. This is happening with my includes file too. that holds our header, footer, navigation etc. I can check with our hosting company to find out.
-
I'd create an index.html for the directory, and then redirect it somewhere. This way, you're capturing the inbound links and then rescuing some of the inbound juice.
Otherwise, you can also check out this post for more info on other solutions and modifying your htaccess file to prevent the directory view - http://perishablepress.com/better-default-directory-views-with-htaccess/
-
Blocking it in robots.txt will work to hide it from search engines.
If you want to hide it from users or people to who type in the url, you can simply drop a blank "index.html" in the /pdf folder.
-
I would suggest 301'ing them to their /index.htm or /pdf.htm equivalents. If you don't know, a 301 is a signal to a web browser (or search crawler) saying "this page has permanently moved, please go to (otherpage.htm) instead".
Here's a good SEOMoz article explaining it a bit more:
http://www.seomoz.org/learn-seo/redirection
What might be more of a concern, is it sounds like your web server has directory browsing enabled. This could be a security issue (depending on your web server setup). Generally you don't want to expose directories if you don't have to because it gives a potential attacker insight into your system setup. Here's an example how to do it in Apache:
www.camelrichard.org/topics/Apache/Turn_OffDirectoryBrowsing
And IIS:
technet.microsoft.com/en-us/library/cc731109(v=ws.10).aspx
If you like I can confirm if you have open directories if you give me the link, either here or through private message.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Product search URLs with parameters and pagination issues - how should I deal with them?
Hello Mozzers - I am looking at a site that deals with URLs that generate parameters (sadly unavoidable in the case of this website, with the resource they have available - none for redevelopment) - they deal with the URLs that include parameters with *robots.txt - e.g. Disallow: /red-wines/? ** Beyond that, they userel=canonical on every PAGINATED parameter page[such as https://wine****.com/red-wines/?region=rhone&minprice=10&pIndex=2] in search results.** I have never used this method on paginated "product results" pages - Surely this is the incorrect use of canonical because these parameter pages are not simply duplicates of the main /red-wines/ page? - perhaps they are using it in case the robots.txt directive isn't followed, as sometimes it isn't - to guard against the indexing of some of the parameter pages??? I note that Rand Fishkin has commented: "“a rel=canonical directive on paginated results pointing back to the top page in an attempt to flow link juice to that URL, because “you'll either misdirect the engines into thinking you have only a single page of results or convince them that your directives aren't worth following (as they find clearly unique content on those pages).” **- yet I see this time again on ecommerce sites, on paginated result - any idea why? ** Now the way I'd deal with this is: Meta robots tags on the parameter pages I don't want indexing (nofollow, noindex - this is not duplicate content so I would nofollow but perhaps I should follow?)
Intermediate & Advanced SEO | | McTaggart
Use rel="next" and rel="prev" links on paginated pages - that should be enough. Look forward to feedback and thanks in advance, Luke0 -
How to Improve Search Ranking from 10 to Top 5
Hi, I was just optimizing a page and have improved it from Page 2 -> Page 1 ; Position 15->10 for now by basic onpage seo edits. I want to understand how I would take it to next level by getting it to Top 5 or Top 3 results. My keyword and page are as follows (all checked in Google India (.in) ) Page - https://nirogam.com/ayurvedic-treatment-home-remedies-chikungunya/
Intermediate & Advanced SEO | | pks333
Keyword - ayurvedic treatment for chikungunya What steps can I take to go from 10 - Top 3 or Top 5 results here? I was checking the Rank tracker & moz Grader for this - got all ticks except adding keyword in URL. Would this be recommended of changing the URL after it's ranking so well to just add this keyword in the same?0 -
Javascript search results & Pagination for SEO
Hi On this page http://www.key.co.uk/en/key/workbenches we have javascript on the paginated pages to sort the results, the URL displayed and the URL linked to are different. e.g. The paginated pages link to for example: page2 http://www.key.co.uk/en/key/workbenches#productBeginIndex:30&orderBy:5&pageView:list& The list is then sorted by javascript. Then the arrows either side of pagination link to e.g. http://www.key.co.uk/en/key/workbenches?page=3 - this is where the rel/prev details are - done for SEO But when clicking on this arrow, the URL loaded is different again - http://www.key.co.uk/en/key/workbenches#productBeginIndex:60&orderBy:5&pageView:list& I did not set this up, but I am concerned that the URL http://www.key.co.uk/en/key/workbenches?page=3 never actually loads, but it's linked to Google can crawl it. Is this a problem? I am looking to implement a view all option. Thank you
Intermediate & Advanced SEO | | BeckyKey0 -
Search box within search results question
I work for a Theater news website. We have two sister sites, theatermania.com in the US and whatsonstage.com in London. Both sites have largely the same codebase and page layouts. We've implemented markup that allows google to show a search box for our site in its results page. For some reason, the search box is showing for one site but not the other: http://screencast.com/t/CSA62NT8 We're scratching our heads. Does anyone have any ideas?
Intermediate & Advanced SEO | | TheaterMania0 -
When you can't see the cache in search, is it about to be deindexed?
Here is my issue and I've asked a related question on this one. Here is the back story. Site owner had a web designer build a duplicate copy of their site on their own domain in a sub folder without noindexing. The original site tanked, the webdesigner site started outranking for the branded keywords. Then the site owner moved to a new designer who rebuilt the site. That web designer decided to build a dev site using the dotted quad version of the site. It was isolated but then he accidentally requested one image file from the dotted quad to the official site. So Google again indexed a mirror duplicate site (the second time in 7 months). Between that and the site having a number of low word count pages it has suffered and looked like it got hit again with Panda. So the developer 301 the version to the correct version. I was rechecking it this morning and the dotted quad version is still indexed, but it no longer lets me look at the cache version. Out of experience, is this just Google getting ready to drop it from the index?
Intermediate & Advanced SEO | | BCutrer0 -
Hiding answers
I would like to create a page with a few questions and when people click the answer comes up. Is google going to index and rank for the hidden text that is comming up when clicking?
Intermediate & Advanced SEO | | Joseph-Green-SEO0 -
List of Search Engines subscribing to the ajax crawling scheme?
Hi, Does anyone have a list of (major) Search Engines that subscribe to the Ajax Crawling Scheme? (https://developers.google.com/webmasters/ajax-crawling/) Specifically interested in major international Search Engines such as Bing/Yahoo, Baidu & Yandex - if anyone knows, please let me know! Thanks in advance
Intermediate & Advanced SEO | | FashionLux0 -
Hiding tag and category root in wordrpess = plunge rank
I deleted the "tag" portion of my tag urls's that were ranking pretty high....so: www.businessinteriors.co.uk/tag/office-fit-out-bristol became www.businessinteriors.co.uk/office fit-out-bristol The old tag page ranked 7 before the change and even 3rd at one stage. The new name page without the tag has re-appeared at 23.... So quite a plunge in ranking from the change and this is across the board for all my tags (200) that were ranking high and I wanted to improve. Have I made a major error? Or will they naturally start coming back to where they were before? Weirdly some of the changes have had a positive impact - so ranking has gone up slightly in some areas..but completely out done by the plungers :-s
Intermediate & Advanced SEO | | bizint0