Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
50% Organic Traffic Drop In the last 48 Hours
Hello, My site had a 50% decrease in the last 48 hours (9/26/18) and I looking for ideas/reasons what would cause such a dramatic drop. Year to year organic traffic has been up 40% and September was up 30%. The site has a domain authority of 39 according to Moz and keywords positions have been flat for a few months. I made a change to the code and robots.txt file on Monday, pre-drop. The category pagination pages had a "NoIndex" with a rel =canonical and I removed the "NoIdnex" per: https://www.seroundtable.com/google-noindex-rel-canonical-confusion-26079.html. I also removed "Disallow" in the robots.txt for stuff like "/?dir" because the pages have the rel =canonical. Could this be the reason for drop?? Other possible reasons:
Intermediate & Advanced SEO | | chuck-layton
1. Google Update: I dont think this is it, but ti looks like the last one was August 1st: "Medic" Core Update — August 1, 2018
2. Site was hacked
3. All of keyword positions dropped overnight: I dont think this is it because Bing has also dropped at the same percentage. Any help, thoughts or suggestions would be awesome.0 -
Changing URLS: from a short well optimised URL to a longer one – What's the traffic risk
I'm working with a client who has a website that is relatively well optimised, thought it has a pretty flat structure and a lot of top level pages. They've invested in their content over the years and managed to rank well for key search terms. They're currently in the process of changing CMS and as a result of new folder structuring in the CMS the URLs for some pages look to have significantly changed. E.g Existing URL is: website.com/grampians-luxury-accommodation which ranked quite well for luxury accommodation grampians New URL when site is launched on new CMS would be website.com/destinations/victoria/grampians My feeling is that the client is going to lose out on a bit of traffic as a result of this. I'm looking for information or ways or case studies to demonstrate the degree of risk, and to help make a recommendation to mitigate risk.
Intermediate & Advanced SEO | | moge0 -
HELP!!! Steep Drop in Organic Traffic Starting 11/1/16
Starting November 1st, organic web traffic from Google dropped from an average of about 60 visits a day to about 5 per day. So we are more than 90% off!!!! At the end of September, we modified the header of the site to simplify it. We also added a snippet of code to each page to enable Zoho "Sales IQ" to work. Sales IQ enables us to track visitors and engage in chat sessions with them. Apart from that no changes have been made from the site. Any ideas as to what could have caused this drop in traffic? Was there a Google update at that time that could have caused the drop? Or could the recent site changes have caused this? I have attached a Google Webmasters Tool report showing the drop in traffic. I would very much appreciate some insight into this, as all organic traffic to our site has ceased. Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan 9VNB1O50 -
Website architecture - levels vs filters and authority loss - Enterprise SEO
Hi Everyone, I am participating in the development of a marketplace website where the main channel will be traffic via SEO. We have encountered the directories (levels) vs filters situation. 1. Does everyone still agree that if we have too many levels, authority is loss as you do down through the levels? Does everyone agree that there should be a max of 3 levels and never 4. Example 1 www.domain.com/level1/level2/level3 vs www.domain.com/level1 In theory, the content on "level 3" will have a lower DA than the content on "level1". 2. Does everyone agree that for enterprise SEO (huge marketplace websites) filters are a better idea than levels? Example 2 www.domain.com/level1/level2/level3 vs www.domain.com/filter-option1 In theory, the content on "level 3" will have a lower DA than the content on "filter-option1". Thanks so much in advance
Intermediate & Advanced SEO | | Carla_Dawson0 -
How stupid is it to launch a new URL structure when our traffic is climbing?
We decided to redesign our site to make it responsive as Google is ranking sites based on mobile friendliness. Along with this we have changed our URL structure, meta tags, page content, site navigation, internal interlinking. How stupid is it to launch this site right in the middle of record traffic? Our traffic is climbing 10,000 more visitors every day with the current site. Visitors have increased 34% over the last 30 days compared to the previous 30 days.
Intermediate & Advanced SEO | | CFSSEO0 -
Should I noindex the site search page? It is generating 4% of my organic traffic.
I read about some recommendations to noindex the URL of the site search.
Intermediate & Advanced SEO | | lcourse
Checked in analytics that site search URL generated about 4% of my total organic search traffic (<2% of sales). My reasoning is that site search may generate duplicated content issues and may prevent the more relevant product or category pages from showing up instead. Would you noindex this page or not? Any thoughts?0 -
Traffic down 60% - about to cry, please help
Hiya guys and girls, I've just spent 6 months, a lot of blood sweat and tears, and money developing www.happier.co.uk. In the last weeks the site started to make a trickle of money, still loss making but showing green shoots. But then on Friday the traffic dropped due to my rankings on google.co.uk dropping. Visits: Thur 25th april = 1950 Fri 26th april = 1284 Sat 27th april = 906 So it looks like Ive been hit with some sort of penalty. I did get a warning on the 20th april about an increase in the number of 404 errors, currently showing 77. I've now remove the links to those 404 pages, ive left the 404 pages as is, as was suggested here: http://www.seomoz.org/blog/how-to-fix-crawl-errors-in-google-webmaster-tools. Could that be the reason? We have spent a lot of time on site design and content. We think the site is good, but I agree it has a long way to go but without income that is hard, so we have been struggling through. Any ideas on the reason/s for the penalty? Big thanks, Julian.
Intermediate & Advanced SEO | | julianhearn0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0