Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
Does traffic for branded searches help a site rank for general terms?
A year or two ago we put up some websites which were specific to brands we own. Sure enough those sites (eg 'myBrand.com') started to rank pretty well for those brand terms eg 'mybrand curling tongs' (it's not curling tongs, btw, but you get the idea). We were getting a decent amount of traffic presumably from people who have bought or seen these products on our amazon/ebay stores. Before long, we see us starting to rank well for non branded searches eg 'curling tongs' even among decent competition. Next thing you know I'm getting told by the boss that we need to put up websites for all specific ranges, not just brands, because specificity is a bonus for ranking well. While there's probably a point that a site for MybrandCurlingTongs lends itself well to ranking for curling tongs, is there also an element that the branded searches we got (via making our brand known on amazon/ebay) helped the site gain recognition and authority? As such a new website about 'ionising hair dryers' would not rank well based on being specific, because it wouldn't be helped by a lot of branded traffic?
Intermediate & Advanced SEO | | HSDOnline2 -
Changing URLS: from a short well optimised URL to a longer one – What's the traffic risk
I'm working with a client who has a website that is relatively well optimised, thought it has a pretty flat structure and a lot of top level pages. They've invested in their content over the years and managed to rank well for key search terms. They're currently in the process of changing CMS and as a result of new folder structuring in the CMS the URLs for some pages look to have significantly changed. E.g Existing URL is: website.com/grampians-luxury-accommodation which ranked quite well for luxury accommodation grampians New URL when site is launched on new CMS would be website.com/destinations/victoria/grampians My feeling is that the client is going to lose out on a bit of traffic as a result of this. I'm looking for information or ways or case studies to demonstrate the degree of risk, and to help make a recommendation to mitigate risk.
Intermediate & Advanced SEO | | moge0 -
Is it possible to find out where traffic is comming from on someone elses website?
Is it possible to find out where traffic is coming from on someone else website? I want to know where the new buyers are coming from who are interested in outsourcing. Attached are some of the pages they would be looking at. Who are visiting these pages and where are they coming from: https://www.upwork.com/blog/ https://www.upwork.com/hiring/ https://www.upwork.com/i/howitworks/client/ https://www.upwork.com/signup/create-account/client_direct https://www.upwork.com/o/profiles/browse/ https://www.upwork.com/press/ https://www.freelancer.com/ https://www.freelancer.com/about https://www.freelancer.com/info/how-it-works.php https://www.freelancer.com/showcase https://www.freelancer.com/community https://www.freelancer.com/hire/ https://www.freelancer.com/contest/ https://www.freelancer.com/feesandcharges/ https://www.freelancer.com/freelancers/ http://www.guru.com/ http://www.guru.com/howitworks.aspx http://www.guru.com/about/ http://www.guru.com/help/ http://www.guru.com/blog/ http://www.guru.com/blog/category/hiring-advice/ http://www.guru.com/d/freelancers/ http://www.guru.com/directory http://www.guru.com/answers/
Intermediate & Advanced SEO | | Hall.Michael0 -
Site experiencing drop in Google rankings and organic traffic after redesign.
Hello, The company that I work for recently implemented a complete redesign for our company website. The former site was old, cumbersome and in desperate need of an update. We streamlined the site structure and made sure to redirect as many pages as we could find to new thematically related pages with 301 redirects. After the launch of our new site we saw a large upswing in "soft" 404 errors despite the fact that most of these pages do redirect upon inspection. So in relation to the soft 404s, for example, is it merely a matter of labeling them as fixed if they redirect properly, or could their be an underling issue with the site itself? Also, a majority or the urls labeled "not found" in webmaster tools are properly redirected. Do these merely need to be marked as fixed, or is there something else that needs to be fixed like the sitemap structure? I appreciate any and all input. Beyond Indigo
Intermediate & Advanced SEO | | BeyondIndigo1 -
Do I have a Panda filter on a specific segment?
Our site gets a decent level of search traffic and doesn't have any site-wide penalty issues, but one of our sections looks like it might be under some form of filter. Unfortunately for us, they're our buy pages! Check out http://www.carwow.co.uk/deals/Volkswagen/Golf it's unique content and I've built white hat links into it, including about 5 from university websites (.ac.uk domains DA70+). If you search something like "volkswagen golf deals" the pages on page 1 have weak thin content and pretty much no links. That content section wasn't always unique, in fact the vast majority of it may well be classed as dupe content as there's no Trim data and they look like this: http://www.carwow.co.uk/deals/Fiat/Punto While we never had much volume, the traffic on all /deals/ pages appears to drop significantly around the time of the May Panda update (4.0). We're planning on completely re-launching these pages with a new design, unique trim content and a paragraph (c.200 words) about the model. Am I right in assuming that there's a Panda filter on the /deals/ segment so regardless of what I do to one deals page it won't rank well, and we have to re-do the whole section?
Intermediate & Advanced SEO | | Matt.Carwow0 -
Keywords Directing Traffic To Incorrect Pages
We're experiencing an issue where we have keywords directing traffic to incorrect child landing pages. For a generic example using fake product types, a keyword search for XL Widgets might send traffic to a child landing page for Commercial Widgets instead. In some cases, the keyword phrase might point a page for a child landing page for a completely different type of product (ex: a search for XL Widgets might direct traffic to XL Gadgets instead). It's tough to figure out exactly why this might be happening, since each page is clearly optimized for its respective keyword phrase (an XL Widgets page, a Commercial Widgets page, an XL Gadgets page, etc), yet one page ends up ranking for another page’s keyword, while the desired page is pushed out of the SERPs. We're also running into an issue where one keyword phrase is pointing traffic to three different child landing pages where none of the ranking pages are the page we've optimized for that keyword phrase, or the desired page we want to rank appears lower in the SERPs than the other two pages (ex: a search for XL Widgets shows XL Gadgets on the first SERP, Commercial Widgets on the second SERP, and then finally XL Widgets down on the third or fourth SERP). We suspect this may be happening because we have too many child landing pages that are targeting keyword terms that are too similar, which might be confusing the search engines. Can anyone offer some insight into why this may be happening, and what we could potentially do to help get the right pages ranking how we'd like?
Intermediate & Advanced SEO | | ShawnHerrick0 -
How to increase the traffic ?
Hi Everyone, I am a bit a newbie in SEO and I read different articles and comments regarding the SEO but I am a bit stuck to get traffic through www.organicur.com. It's a really new website build through Prestashop (1-2 month). I used the tool keyword analysis to look after keywords not to competitive. I used the on-page optimization of Seomoz until to have A for every pages and I have started to build backlinks. But the traffic doesn't improve at all. Does that mean my keywords are not relevant enough ? Do I need to wait and carry on the links building. Do I need to go through PPC ? Thanks a lot for your reply, K
Intermediate & Advanced SEO | | NeSEO0