Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website architecture - levels vs filters and authority loss - Enterprise SEO
Hi Everyone, I am participating in the development of a marketplace website where the main channel will be traffic via SEO. We have encountered the directories (levels) vs filters situation. 1. Does everyone still agree that if we have too many levels, authority is loss as you do down through the levels? Does everyone agree that there should be a max of 3 levels and never 4. Example 1 www.domain.com/level1/level2/level3 vs www.domain.com/level1 In theory, the content on "level 3" will have a lower DA than the content on "level1". 2. Does everyone agree that for enterprise SEO (huge marketplace websites) filters are a better idea than levels? Example 2 www.domain.com/level1/level2/level3 vs www.domain.com/filter-option1 In theory, the content on "level 3" will have a lower DA than the content on "filter-option1". Thanks so much in advance
Intermediate & Advanced SEO | | Carla_Dawson0 -
95% of organic traffic lands in my homepage, despite having a 250 page website with a "seo optimized" hierarchical structure. Any suggestion as to what might be happening?
Challenging issue All the "usual suspects" have been discarded: all pages included in google index, no google penalties, metas optimized, kw's segregated by pages/cluster of pages to avoid cannibalization... BUT, we know we are missing something website is www.e-florex.com and is an e-commerce site based on magento Any ideas you might think are worth exploring? Thanks in advance for your help Juan
Intermediate & Advanced SEO | | juanmarn0 -
Whats the best way to implement rel = “next/prev” if we have filters?
Hi everyone, The filtered view results in paginated content and has different urls: example: https://modli.co/dresses.html?category=45&price=13%2C71&size=25 Look at what it says in search engine land: http://searchengineland.com/implementing-pagination-attributes-correctly-for-google-114970 Look at Advanced Techniques paragraph. do you agree? it seem like google will index the page multiple times for every filter variant. Thanks, Yehoshua
Intermediate & Advanced SEO | | Yehoshua0 -
What exactly is an impression in Google Webmaster Tools search queries with the image filter turned on?
Is it when someone does an image search? Or does it count a regular search that has images in it? On an image search does the picture actually have to be viewed on the screen or can it be below in the infinite scroll?
Intermediate & Advanced SEO | | EcommerceSite0 -
Best way to implement canonical tags on an ecommerce site with many filter options?
What would be the best way to add canonical tags to an ecommerce site with many filter options, for example, http://teacherexpress.scholastic.com? Should I include a canonical tag for all filter options under a category even though the pages don't have the same content? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
Having Content be the First thing the bots see
If you have all of your homepage content in a tab set at the bottom of the page, but really would want that to be the first thing Google reads when it crawls your site, is there something you can implement where Google reads your content first before it reads the rest of your site? Does this cause any violations or are there any red flags that get raised from doing this? The goal here would just be to get Google to read the content first, not hide any content
Intermediate & Advanced SEO | | imageworks-2612900 -
Traffic has dropped off a cliff
I just spent thousands of $$ redesigning my site. Have a Wordpress site now with Yoast SEO. However, my organic traffic has dropped of a cliff. Can any of you help me figure out why? www.UltimateBasicTraining.com
Intermediate & Advanced SEO | | StreetwiseReports1 -
If we add noindex to a subdomain, will the traffic to that subdomain still generate domain authority for the primary domain?
We are trying to decide whether a password protected site, that we will noindex, should be set up as a subdomain or if it should be its own domain. The determining factor here is whether or not having that noindexed subdomain will increase domain authority since its noindexed. Any ideas???
Intermediate & Advanced SEO | | grayloon0