Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is surfacing top blog posts with read more link could create a boost in traffic to main domain?
Hi mozzers, Because our blog is located on blog.example.com on powered by Wordpress and currently can't migrate it to the main domain, unfortunately. Since we would like to grow our main's domain organic traffic and would like to test an option that could help us leverage the traffic of the top blog posts content. There is a Wordpress API that would allow us to get 100-200 words(snippet of the blog post) from the blog posts into the main domain that would provide a "Read more link" linking back to the blog.
Intermediate & Advanced SEO | | Ty1986
Is this even a good idea assuming we would make sure content is not identical?0 -
Site Migration Question - Do I Need to Preserve Links in Main Menu to Preserve Traffic or Can I Simply Link to on Each Page?
Hi There We are currently redesigning the following site https://tinyurl.com/y37ndjpn The local pages links in the main menu do provide organic search traffic. In order to preserve this traffic, would be wise to preserve these links in the main menu? Or could we have a secondary menu list (perhaps in the header or footer), featured on every page, which links to these pages? Many Thanks In Advance for Responses
Intermediate & Advanced SEO | | ruislip180 -
How to avoid adult traffic to site?
A client of ours is increasingly getting a lot of adult traffic to their site, where they show up only for adult searches and not at all for relevant searches. How can we stop Google associating their site with adult content? Here's a blog example, giving advice to parents on girls and body image issues: https://www.commonsensemedia.org/blog/girls-and-body-image keywords driving traffic to this page are all around images for 'young nude girls' etc.
Intermediate & Advanced SEO | | MediaCause0 -
Blog Traffic
Hi all, As of today, we put up approximately 900 high-quality, 100% original articles on our blog. However, we have not been able to generate any good traffic since July when it was first launched (blog.ostanding.com). Any suggestion would be greatly appreciated! Thanks again.
Intermediate & Advanced SEO | | businessowner0 -
Do I have a Panda filter on a specific segment?
Our site gets a decent level of search traffic and doesn't have any site-wide penalty issues, but one of our sections looks like it might be under some form of filter. Unfortunately for us, they're our buy pages! Check out http://www.carwow.co.uk/deals/Volkswagen/Golf it's unique content and I've built white hat links into it, including about 5 from university websites (.ac.uk domains DA70+). If you search something like "volkswagen golf deals" the pages on page 1 have weak thin content and pretty much no links. That content section wasn't always unique, in fact the vast majority of it may well be classed as dupe content as there's no Trim data and they look like this: http://www.carwow.co.uk/deals/Fiat/Punto While we never had much volume, the traffic on all /deals/ pages appears to drop significantly around the time of the May Panda update (4.0). We're planning on completely re-launching these pages with a new design, unique trim content and a paragraph (c.200 words) about the model. Am I right in assuming that there's a Panda filter on the /deals/ segment so regardless of what I do to one deals page it won't rank well, and we have to re-do the whole section?
Intermediate & Advanced SEO | | Matt.Carwow0 -
What are your best moves if you want to get your traffic and rankings back for a specific keyword?
Hi all We are server and website monitoring company for over 13 years and I dare to say our product evolved and mastered over the years. Our marketing not so much. Most of our most convertible traffic came from the keyword "ping test" with our ping test tool page, and for the first 10 years we have been positioned 1-3 in Google.com so it was all good. The last two years we have been steady on positioned 8-9, and since 7-30-13 we are on the second page. We have launched a blog in 2009 at http://www.websitepulse.com/blog, and post 2-3 times a week, and are working on new website now, and my question is what is your advice in our situation? Aside from providing fresh content and launching a new website is there anything specific we could do at this stage to improve our position for "ping test"? Thanks Lily
Intermediate & Advanced SEO | | wspwsp0 -
Why traffic to my link has dropped suddenly?
Hi I would like to know why the traffic for the website link http://theindustrymeasure.com/2010/07/15/rediffmail-login has dropped suddenly on google.I used to get around 5000 page views on this page and then suddenly dropped to 15-20 . I still get good traffic from yahoo (around 500). Just before the drop I noticed that I started to get spammy trackbacks from Many questionable sources. I have not approved any of these trackbacks. The trackbacks are regular frequency of. 1 per day. is there any action which I can take to ensure that I get back my traffic. Traffic to other links are fine , only this page seems to have dropped off ever since the spam attack. As per seomoz tool I have a grade a for keyword rediffmail
Intermediate & Advanced SEO | | ShoutOut0 -
Ranking & Traffic drops in last month
Over the last month, our rankings have been in a slow slide - that is until this week, when they absolutely crashed. Here are some example phrases: Phrase 11-Mar 5-Mar bug shields 24 9
Intermediate & Advanced SEO | | ShawnHerrick
floor mats 25 14
nerf bars 23 12
running boards 61 14
snow plows 25 18 For the life of me, I can't see what would have caused such drastic changes. Our site is almost completely unique content. Some things, like Warranty & Install instructions, are from the manufacturer to protect us from liabilities. We come up with our own feature text, and we have custom written articles, blog posts, research guides, etc. We also appear to be the only one of our competitors being affected in this fashion. Any thoughts would be helpful. Domain is realtruck.com.0