Disallow: /jobs/? is this stopping the SERPs from indexing job posts
-
Hi,
I was wondering what this would be used for as it's in the Robots.exe of a recruitment agency website that posts jobs. Should it be removed?Disallow: /jobs/?
Disallow: /jobs/page/*/Thanks in advance.
James -
Hi James,
So far as I can see you have the following architecture:
- job posting: https://www.pkeducation.co.uk/job/post-name/
- jobs listing page: https://www.pkeducation.co.uk/jobs/
Since from the robots.txt the listing page pagination is blocked, the crawler can access only the first 15 job postings are available to crawl via a normal crawl.
I would say, you should remove the blocking from the robots.txt and focus on implementing a correct pagination. *which method you choose is your decision, but allow the crawler to access all of your job posts. Check https://yoast.com/pagination-seo-best-practices/
Another thing I would change is to make the job post title an anchor text for the job posting. (every single job is linked with "Find out more").
Also if possible, create a separate sitemap.xml for your job posts and submit it in Search Console, this way you can keep track of any anomaly with indexation.
Last, and not least, focus on the quality of your content (just as Matt proposed in the first answer).
Good luck!
-
Hi Istvan,
Sorry I've been away for a while. Thanks for all of your advice guys.
Here is the url if that helps?
https://www.pkeducation.co.uk/jobs/
Cheers,
James
-
The idea is (which we both highlighted), that blocking your listing page from robots.txt is wrong, for pagination you have several methods to deal with (how you deal with it, it really depends on the technical possibilities that you have on the project).
Regarding James' original question, my feeling is, that he is somehow blocking their posting pages. Cutting the access to these pages makes it really hard for Google, or any other search engine to index it. But without a URL in front of us, we cannot really answer his question, we can only create theories that he can test
-
Ah yes when it's pointed out like that, it's a conflicting signal isn't It. Makes sense in theory, but if you're setting it to noindex and then passing that on via a canonical it's probably not the best is it.
They're was link out in that thread to a discussion of people who still do that with success, but after reading that I would just use noindex only as you said. (Still prefer the no index on the robots block though)
-
Sorry Richard, but using noindex with canonical link is not quite a good practice.
It's an old entry, but still true: https://www.seroundtable.com/noindex-canonical-google-18274.html
-
I don't think it should be blocked by robots.txt at all. It's stopping Google from crawling the site fully. And they may even treat it negatively as they've been really clamping down on blocking folders with robots.txt lately. I've seen sites with warning in search console for: Disallow: /wp-admin
You may want to consider just using a noindex tag on those pages instead. And then also use a canonical tag that points back to the main job category page. That way Google can crawl the pages and perhaps pass all the juice back to the main job category page via the canonical. Then just make sure those junk job pages aren't in the sitemap either.
-
Hi James,
Regarding the robots.txt syntax:
Disallow: /jobs/? which basically blocks every single URL that contains /jobs/**? **
For example: domain.com**/jobs/?**sort-by=... will be blocked
If you want to disallow query parameters from URL, the correct implementation would be Disallow: /jobs/*? or even specify which query parameter you want to block. For example Disallow: /jobs/*?page=
My question to you, if these jobs are linked from any other page and/or sitemap? Or only from the listing page, which has it's pagination, sorting, etc. is blocked by robots.txt? If they are not linked, it could be a simple case of orphan pages, where basically the crawler cannot access the job posting pages, because there is no actual link to it. I know it is an old rule, but it is still true: Crawl > Index > Rank.
BTW. I don't know why you would block your pagination. There are other optimal implementations.
And there is always the scenario, that was already described by Matt. But I believe in that case you would have at least some of the pages indexed even if they are not going to get ranked well.
Also, make sure other technical implementations are not stopping your job posting pages from being indexed.
-
I'd guess that the jobs get pulled from a job board. If this is the case, then the content ( job description, title etc.) will just be a duplication of the content that can be found in many other locations. If a plugin is used, they sometimes automatically add a disallow into the robots.txt file as to not hurt the parent version of the job page by creating thousands of duplicate content issues.
I'd recommend creating some really high-quality hub pages based on job type, or location and pulling the relevant jobs into that page, instead of trying to index and rank the actual job pages.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index or noindex mobile version?
We have a website called imones.lt
Intermediate & Advanced SEO | | FCRMediaLietuva
and we have a mobile version for it m.imones.lt We originally put noindex for m.imones.lt. Is it a good decision or no? We believe that if google indexes both it creates double content. We definitely don't want that? But when someone through google goes to any of imones.lt webpage using smartphone they are redirected to m.imones.lt/whatever Thank you for your opinion.0 -
Help with https// redirects
Hey there
Intermediate & Advanced SEO | | Jay328
I have a client who just moved from a self hosted CMS to Adobe Catalyst (don't ask!)
The problem: Their url indexed with google is https://domain.com, Adobe Catalyst does not support third party SSL certificates or https domains. Now when people google them https://domain.com shows up in search, HOWEVER it does not have a trusted certificate and a pop up window blocks the site. They are a mortgage company so SSL is really not needed. What can I do to get google to recognize the site at http: vs. https? Would this be something in GWMT? Thanks!0 -
What do you think of this post? An ordered article or not???
Hi, Just encountered the following article on Digital Trends: http://www.digitaltrends.com/mobile/lg-z-rumored/ This is a huge and respected site. Notice, that whenever the word "smartphone" or "smartphones" is mentioned, there is a link to Sprint. Needless to say that Sprint has nothing to do directly with the article's subject (a new LG smartphone that may be coming soon). So, is this an ordered piece? Is this legit? Does it assist Sprint with the article that is not really related? Should I pursue these type of articles (links) for my site or only HUGE companies can get away with it? Any thoughts?
Intermediate & Advanced SEO | | BeytzNet0 -
Should I change .html to / ?
On my ecommerce site, we have .html extensions on all files and categories. I was wondering if it is worth the development cost to make all of them / ? Is there any SEO benefit in doing so? Thanks, B
Intermediate & Advanced SEO | | bjs20100 -
Erratic Behaviour In The SERPS
I am seeing some really erratic behaviour in the SERPS just now. We have 2 domains a .com and .co.uk The .com is holding fine on page 1 however the .co.uk is jumping from page 1 to page 4 almost on a daily basis. Now, we are aware that our link profile is not the best on this domain and we are working on this just now creating more quality content/links. If this was a penalty surely it would drop to page 4 and stay there... This bouncing around seems very strange..... We have updated the on page content etc to make sure that we are following all best practices but nothing seems to be working... Has anyone else experienced this kind of problem? Matthew
Intermediate & Advanced SEO | | EwanFisher0 -
Link Building Post Penguin?
I really am lost as to what to do these days.. The problem with my industry is the whole idea of link bait isn't very lucrative. There are no bloggers either, so guest blogging also isn't a very good option. Seems to me like the best thing I can do is just publish content! So, publish a lot of quality content? LOL, sounds like that's right up Google's alley. Where do you publish your content, and what would you say has shown the best results for you personally? We called an SEO company, Arteworks, a few days ago (Friday), and they really didn't go into any details about how they build links. We called them because I saw a post that you commented on, here, and it recommended a few companies at the bottom of the post. (Arteworks being one of them) Really, this is where I get so dang confused... The goal is to build links like the old days, except only use unique content, diversify your pages, and anchor text? Sound about right? Or, should I only create content on my site? Thanks in advance for your time and advice!! Sincerely, Tyler Abernethy
Intermediate & Advanced SEO | | TylerAbernethy0 -
/%category%/%postname%/ Permalink structure
Mostly everyone seems to agree that /%category%/%postname%/ is the best blog structure. I'm thinking of changing my structure to that because now it's structured by date which is bad. But almost all of my posts are assigned to more than one category. Won't this create duplicate pages?
Intermediate & Advanced SEO | | UnderRugSwept0 -
Blog not showing up when searching for exact post/META titles
I am working on a blog http://www.possessionista.com which is a very popular fashion blog. It is very well established with a 100% natural link profile and zero spammy stuff. The blog ranks #1 for random fashion terms like "kourtney kardashian cat eye sunglasses" and "emily maynard boots". The problem I am experiencing is that none of the actual titles of her posts or any of the content in the post results in her blog showing up if searched. EX: http://www.possessionista.com/2011/10/pippa-middletons-zip-jacket.html When you search "Pippa Middleton's Zip Jacket" on google her blog is nowhere to be found. Try searching allintitle:"Pippa Middleton's Zip Jacket" and she's nowhere to be found either. Even search "The other day, I met with my friend Kiran for our monthly mutual admiration society" on google and she's nowhere to be found even thoguh this is a unique snippet from her post. This post is already indexed and cached with the above mentioned details. i've also tested dozens of older posts as well. Same issue. You can actually do this to see a more clear picture: Do a google search for: allintitle:Bachelorette Fashion: Episode 2 - Ashley Hebert Brown site:possessionista.com That will bring up her blog which means google recognizes that the phrase is in her META title. Now do a google search for: allintitle:Bachelorette Fashion: Episode 2 - Ashley Hebert Brown without the site: included. She does not pop up but other people do. I did find that she had a duplicate title tag for a few weeks, but I've fixed that. Her posts used to pop up #1 when you search the title, but now obviously not. I am kind of at a loss and have tried a bunch of options with no success. Oh, one other thing is that some people do scrape her content, but only a few like maybe 10 and they've always been doing it even when she used to rank for her own post titles. Have you guys experienced this issue? Do you have any ideas of how to fix it?
Intermediate & Advanced SEO | | modparent0