/$1 URL Showing Up
-
Whenever I crawl my site with any kind of bot or a sitemap generator over my site. it comes up with /$1 version of my URLs. For example:
It gives me hdiconference.com & hdiconference.com/$1 and hdiconference.com/purchases & hdiconference.com/purchases/$1
Then I get warnings saying that it's duplicate content. Here's the problem: I can't find these /$1 URLs anywhere. Even when I type them in, I get a 404 error. I don't know what they are, where they came from, and I can't find them when I scour my code.
So, I'm trying to figure out where the crawlers are picking this up. Where are these things? If sitemap generators and other site crawlers are seeing them, I have to assume that Googlebot is seeing them as well.
Any help? My developers are at a loss as well.
-
Perfect. Thanks for the help, guys!
-
If you can't find them, you could put a disallow in your robots.txt files to keep them from being crawled.
-
I had a similar issue and found it was due to (in the case of a MozPro crawl at least) the bot crawling a JS command in the head. One of the commands included an anchor tag that was being read as a link rather than in context of the java script command. Check your JS files/scripts. It might be in there somewhere.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Orphaned unwanted urls from the cms
Hi
Technical SEO | | MattHopkins
I am working on quite an old cms, and there are bunch of urls that don't make any sense.
https://www.trentfurniture.co.uk/products/all-outdoor-furniture/all-outdoor-furniture/1
https://www.trentfurniture.co.uk/products/all-chairs/all-chairs/1
https://www.trentfurniture.co.uk/products/all-industries/all-chairs/1
https://www.trentfurniture.co.uk/products/all-chairs/all-industries/1
https://www.trentfurniture.co.uk/products/all-chairs/banqueting-furniture/1
https://www.trentfurniture.co.uk/products/all-chairs/bar-furniture/1
https://www.trentfurniture.co.uk/products/all-chairs/bentwood-furniture/1
For example there are no internal links. And fortunately not much traffic at all. But I can't see in the cms why they are generating? I've tried to check the html code to check why, what's the reason? But all I can think of is the structure....? something odd the cms writes?
Anyone have any ideas please? And would I redirect all these? Just thinking there could be a better solution/fix, rather than redirects since there are no links or traffic.....Like the devs solve why they are generating.....Unfortunately I get very slow responses from the devs as a 3rd pty company, hence on here ;0). (Some of those are indexed too)... :0) Thanks in advance....0 -
Folders in url structure?
Hello, Revamping an out-of-date website and am wondering if I need to include the folders (categories) in the url structure? The proposed structure has 8 main folders. I've been reading that Google is ok if the folder is not included in the url, but is it really? The hesitation I have is that the urls are getting long and the main folder only has only a sub folder beneath it. So, /folder-name/facility-name/treatment-overview. This looks too long, doesn't it? Thanks!
Technical SEO | | lfrazer1230 -
Vanity URLs are being indexed in Google
We are currently using vanity URLs to track offline marketing, the vanity URL is structured as www.clientdomain.com/publication, this URL then is 302 redirected to the actual URL on the website not a custom landing page. The resulting redirected URL looks like: www.clientdomain.com/xyzpage?utm_source=print&utm_medium=print&utm_campaign=printcampaign. We have started to notice that some of the vanity URLs are being indexed in Google search. To prevent this from happening should we be using a 301 redirect instead of a 302 and will the Google index ignore the utm parameters in the URL that is being 301 redirect to? If not, any suggestions on how to handle? Thanks,
Technical SEO | | seogirl221 -
If a URL canonically points to another link, is that URL indexed?
Hi, I have two URL both talking about keyword phrase 'counting aggregated cells' The first URL has canonical link pointing to the second URL, but if one searches for 'counting aggregated cells' both URLs are shown in the results. The first URL is the pdf, and i need only second URL (the landing page) to be shown in the search results. The canonical links should tell Google which URL to index, i don't understand why both URLs are present in search results? Is 'noindex' for the first URL only solution? I am using Yoast SEO for my website. Thank you for the answers.
Technical SEO | | Chemometec0 -
/home-2 showing in SERPS but not the homepage
I'm in the process of having a site built using WP as the cms, and keeping SEO in mind while it's being produced. Because I'm experimenting with title/meta desc I'm checking rankings each day on whatsmyserp dot com. During development I noticed one day the ranking for websitename.com had disappeared and websitename.com/home-2 was ranking. I went into pages of the wp account and deleted the 2nd homepage that had been created for some reason, and that was over half a week ago now. /home-2 is still ranking even though it's non-existent and the actual homepage url isn't ranking at all. Any suggestions on what I should do/why this is happening? Thanks for any help
Technical SEO | | xcyte0 -
Google not showing my website ?
The website is medicare.md. if you search for term "medicare doctors PG county maryland" it is #1 in bing and yahoo but not even showing on google.com first TEN pages, although not banned. Interestingly if you do that search on google.co.pk it is #4. Quite Puzzuling !! Would appreciate any help or advice . Sherif Hassan
Technical SEO | | sherohass0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Redirecting blog.<mydomain>.com to www.<mydomain>.com\blog</mydomain></mydomain>
This is more of a technical question than pure SEO per se, but I am guessing that some folks here may have covered this and so I would appreciate any questions. I am moving from a WordPress.com-based blog (hosted on WordPress) to a WordPress installation on my own server (as suggested by folks in another thread here). As part of this I want to move from the format blog.<mydomain>.com to www.mydomain.com\blog. I have installed WordPress on my server and have imported posts from the hosted site to my own server. How should I manage the transition from first format to the second? I have a bunch of links on Facebook, etc that refer to URLs of the blog..com format so it's important that I redirect.</mydomain> I am running DotNetNuke/WordPress on my own IIS/ASP.Net servers. Thanks. Mark
Technical SEO | | MarkWill0