Blocked by meta-robots but there is no robots file
-
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca
What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
-
The .htaccess file is in placing directing www to non www, so I don't see what else I could do with that. I forgot to mention the website was recently overhauled by someone else, and they are having me help with SEO. Not sure if that has anything to do with it. It looks like the .htaccess should be reversed so the non www points to the www which has more value. Someone else designed this site and they are having me do the SEO on it for them.
-
The issue might be the forwarding from www.yourdomain.ca to yourdomain.ca
look at http://www.opensiteexplorer.org/pages?site=marketalert.ca%2F
and here http://www.opensiteexplorer.org/pages?site=www.marketalert.ca%2F
..some are indexed on with www and other without www. , this is your main issue.
recommendation:
- revisit the htaccess file or where the redirect has been set DNS..
- choose one with www or without and stick to it.
- revicit your external links and make the changes to your links
- create new sitemap and resubmit to SearchEngines
-
I ran the SEO web crawler and it finished already. Successfully crawled all pages. I still have to wait for another week to get the main campaign updated and see results there, but I believe it may work too now.
I guess I solved my own problem after being directed to robots.txt by Jim. I found that the Wordpress plugin for SEO xml sitemap creator was the problem because it created a virtual robots.txt file which sent me on a wild goose chase looking for a robots.txt file which didn't exist. Creating a robots.txt file allowing all seems to be the solultion, incase anyone else has this same problem.
-
If you can, follow up either way - happy to help you get it debugged!
-
I was able to update my sitemap.xml with Google webmaster tools no problem. I'm not 100% confident though that means the entire site is searchable by the spiders. I guess I'll know for sure in a few days tops.
-
I agree with Jim. Update your sitemap.xml files with Google Webmaster Tools. That will also help you identify problems you might be missing.
-
I've done some more looking into it and seems to be a problem when Wordpress uses the XML site generator plugin. It creates a virtual robot.txt file, which is why I couldn't find the robot.txt file. Apparently the only fix is to replace it with an actual robot.txt file forcing it to allow all.
I just replaced the robots.txt file with a real one allowing all. SEOmoz estimates a few days to test site crawl and it's another 7 days before the next scheduled crawl. I'd kinda like to find out sooner if it's not going to work. There must be a faster test. I don't need a detailed test, just a basic test that says, YEP, we can see this many pages or something like that.
-
hi
your robots.txt file is located here http://marketalert.ca/robots.txt, which is the root of your website directory.
this is the actual location of your sitemap file (http://marketalert.ca/sitemap.xml), does the Google WT show any issues about the sitemap file could not be found?
You might need to resubmit the sitemap file, if there are any changes, of course with the updated version of your site.
hope this helps.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop robots.txt restricting access to sitemap?
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself? The robots.txt page shows User-agent: * Disallow: / And then sitemap: with the correct sitemap link
Technical SEO | | Ad-Rank0 -
Robots.txt & meta noindex--site still shows up on Google Search
I have set up my robots.txt like this: User-agent: *
Technical SEO | | RoxBrock
Disallow: / and I have this meta tag in my on a Wordpress site, set up with SEO Yoast name="robots" content="noindex,follow"/> I did "Fetch as Google" on my Google Search Console My website is still showing up in the search results and it says this: "A description for this result is not available because of this site's robots.txt" This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.1 -
Exclude root url in robots.txt ?
Hi, I have the following setup: www.example.com/nl
Technical SEO | | mikehenze
www.example.com/de
www.example.com/uk
etc
www.example.com is 301'ed to www.example.com/nl But now www.example.com is ranking instead of www.example.com/nl
Should is block www.example.com in robots.txt so only the subfolders are being ranked?
Or will i lose my ranking by doing this.0 -
Templates for Meta Description, Good or Bad?
Hello, We have a website where users can browse photos of different categories. For each photo we are using a meta description template such as: Are you looking for a nice and cool photo? [Photo name] is the photo which might be of interest to you. And in the keywords tags we are using: [Photo name] photos, [Photo name] free photos, [Photo name] best photos. I'm wondering, is this any safe method? it's very difficult to write a manual description when you have 3,000+ photos in the database. Thanks!
Technical SEO | | TheSEOGuy10 -
Similar Websites, Same C Block: Can I Get a Penalty?
One of my website has been heavily hit by Google's entire zoo so I decided to phase it out while building a new one. Old website: www.thewebhostinghero.com
Technical SEO | | sbrault74
New website: www.webhostinghero.com Now the thing is that both websites are obviously similar since I kept the branding. They also both have content about the same topics. No content has been copied or spinned or whatever though. Everything's original on both websites. There were only 3 parts of both websites that were too similar in terms of functionalities so I "noindexed" it on the old website. Now it seems that Google doesn't want you to have multiple websites for the same business just for the sake of occupying more space in the search results. This can especially be detected by the websites' C block. I am not sure if this is myth or fact though. So do you think I'm in a problematic situation with this scenario? It's getting ridiculous all you have to watch for when building a website, I'm afraid to touch my keyboard in fear my websites will get penalized! Sorry for my english btw.0 -
Do I need both canonical meta tags AND 301 redirects?
I implemented a 301 redirect set to the "www" version in the .htaccess (apache server) file and my logs are DOWN 30-40%! I have to be doing something wrong! AddType application/x-httpd-php .html .htm RewriteCond %{HTTP_HOST} ^luckygemstones.com
Technical SEO | | spkcp111
RewriteRule (.*) http://www.luckygemstones.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.htm
RewriteRule ^(.)index.htm$ http://www.luckygemstones.com/$1 [R=301,L] IndexIgnore *
ErrorDocument 404 http://www.luckygemstones.com/page-not-found.htm
ErrorDocument 500 http://www.luckygemstones.com/internal-serv-error.htm
ErrorDocument 403 http://www.luckygemstones.com/forbidden-request.htm
ErrorDocument 401 http://www.luckygemstones.com/not-authorized.htm I've also started adding canoncial META's to EACH page: I'm using HMTL 4.0 loose still--1000's of pages--painful to convert to HTML5 so I left the / off the tag so it would validate. Am I doing something wrong? Thanks, Kathleen0 -
Robots.txt Syntax
Does the order of the robots.txt syntax matter in SEO? For example (are there potential problems with this format): User-agent: * Sitemap: Disallow: /form.htm Allow: / Disallow: /cgnet_directory
Technical SEO | | RodrigoStockebrand0 -
Best blocking solution for Google
Posting this for Dave SottimanoI Here's the scenario: You've got a set of URLs indexed by Google, and you want them out quickly Once you've managed to remove them, you want to block Googlebot from crawling them again - for whatever reason. Below is a sample of the URLs you want blocked, but you only want to block /beerbottles/ and anything past it: www.example.com/beers/brandofbeer/beerbottles/1 www.example.com/beers/brandofbeer/beerbottles/2 www.example.com/beers/brandofbeer/beerbottles/3 etc.. To remove the pages from the index should you?: Add the Meta=noindex,follow tag to each URL you want de-indexed Use GWT to help remove the pages Wait for Google to crawl again If that's successful, to block Googlebot from crawling again - should you?: Add this line to Robots.txt: DISALLOW */beerbottles/ Or add this line: DISALLOW: /beerbottles/ "To add the * or not to add the *, that is the question" Thanks! Dave
Technical SEO | | goodnewscowboy0