Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unexplainable drop in traffic
Hello Mozzers, I am new at Moz and certainly hope you can help me!
Intermediate & Advanced SEO | | svsanchez75
I used to consider myself very knowledgeable in SEO since I had hundreds or maybe thousands of keywords very well ranked on Google Guatemala (www.google.com.gt) for more than a decade. That was until a year ago: November 8th, 2019... On that date, all of my rankings plummeted, never to return. At first I thought it was just a hickup from the Google Algorithm, but as days turned into weeks I became very worried, so I started doing a bunch of things: Reworked the pages (except home page) to make them responsive (I used to have two versions of each page, one for desktop and one for mobile). Removed the ads from the pages to make them faster, thinking it could be due to the speed (even if my competitors sites were slower!) Fixed thousands of 404s Disavowed thousands of bad domains and spammy URLs (I never bought a single link but there were thousands of links from forums, theglobe network, etc) Removed more than 12,000 members from my own forum who had never posted anything: about 4000 of them just had created a profile to include a link to different extenral sites. Fixed a few other technical aspects... Nothing helped. In fact, my rankings have kept going down. My content is good and unique, my site has good DA but still I get outranked and buried by several sites which don't have as much information as I have. There are NO manual actions against my site according to Search Console. As I mentioned before, I have never bought any backlinks, so all of my links should be natural (although there were thousands of links to my forum from spammy sites which I disavowed). So, I am frustrated as I really don't know what the problem is. I am giving you 3 examples of Keywords and URLs of pages that were number one for years, and now are not even on first page, so that you can see them and tell me your thoughts about what may be happening: KEYWORD: VOLCANES DE GUATEMALA
URL: https://www.deguate.com/geografia/volcanes/Los-volcanes-de-Guatemala.shtml Note: was #1, now is on 5th page. Removed all the ads to see if it would help. KEYWORD: MINISTERIOS DE GUATEMALA URL: https://www.deguate.com/artman/publish/politica_ministerios/Los-ministerios-de-Guatemala.shtml Note: was #1, now is on 2nd page. Removed all the ads to see if it would help. KEYWORD: LEYENDAS DE GUATEMALA URL: https://www.deguate.com/artman/publish/misterios-leyendas/las-leyendas-mas-famosas-en-guatemala.shtml Note: Was #1, now is on 3rd page. It's stuffed with ads, as it doesn't seem to matter wheter my pages have ads or not, and since I lost my rankings on thousands of pages at least I can probably generate a little more income like that. Thank you so much for the help you can provide!0 -
(Urgent) losing traffic after 301 redirect
We face a seo problem of losing traffic after 301 redirect.We have used 301 redirect from a sub-domain url to main domain, after a few month, we discovered that the traffic in google is dropped 40% as well as yahoo dropped 50% without reason, we have updated sitemap already, but we cannot find any reason for the traffic dropped till now..The original url (more then 5000 links)https://app.example.com/ebook Redirected Urlhttps://www.example.com/ebookThank you for your help!
Intermediate & Advanced SEO | | yukung0 -
My direct traffic went up and my organic traffic went down. Help!
So on Oct. 21, our direct traffic increased 3x and our organic traffic decreased 3x. And it has been that way ever since. Almost like they flip flopped. Additionally, that was the same day I started retargeting to our site. I have tagged all the links from the ads and they're being counted as google paid clicks in GA. And our accounts are linked. I am just dumbfounded as to how this could happen.
Intermediate & Advanced SEO | | Eric_OWPP1 -
Natural Fluctuation in Search Traffic
This is going to sound like a weird question... I'm curious to know whether there is a natural fluctuation in the actual number of searches being made online each week. It would be great to relate this to the performance of my own organic traffic each week. For example, if organic search traffic is down 10% week on week, is that because search in general is down 10%? Has anybody ever looking into this?
Intermediate & Advanced SEO | | ausmed0 -
Site re-design, full site domain A/B test, will we drop in rankings while leaking traffic
We are re-launching a client site that does very well in Google. The new site is on a www2 domain which we are going to send a controlled amount of traffic to, 10%, 25%, 50%, 75% to 100% over a 5 week period. This will lead to a reduction in traffic to the original domain. As I don't want to launch a competing domain the www2 site will not be indexed until 100% is reached. If Google sees the traffic numbers reducing over this period will we drop? This is the only part I am unsure of as the urls and site structure are the same apart from some new lower level pages which we will introduce in a controlled manner later? Any thoughts or experience of this type of re-launch would be much appreciated. Thanks Pete
Intermediate & Advanced SEO | | leshonk0 -
E-commerce category page optimization - filters vs. categories
Hi, We currently have a site where there are several subcategories for every main category. So this means that visitors will have to click through 3-4 subcategories before reaching products that they could have easily found if the site would be using filters on category pages. My question is - if a subcategory page with 4 products is currently a category page (optimized heading, description) and I'd want this category to be available through filters, how do I still keep it optimized for search engines? So under a category "Cleaners", we have all cleaning products. There are 8 "Cable cleaners" under this category. This is currently a subcategory, but I'd just solve this with a filter in the "Cleaners" screen. Not sure what's right from an SEO standpoint here.
Intermediate & Advanced SEO | | JaanMSonberg0 -
Google's Exact Match Algorithm Reduced Our Traffic!
Google's first Panda de-valued our Web store, www.audiobooksonline.com, and our traffic went from 2500 - 3000 (mostly organic referrals) per month to 800 - 1000. Google's under-valuing of our Web store continued to reduce our traffic to 400-500 for the past few months. From 4/5/2013 to 4/6/2013 our traffic dropped 50% more, because (I believe) of Google's "exact domain match" algorithm implementation. We were, even after Panda and up to 4/5/2013 getting a significant amount of organic traffic for search terms such as "audiobooks online," "audio books online," and "online audiobooks." We no longer get traffic for these generic keywords. What I don't understand is why a UK company, www.audiobooksonline.co.uk/, with a very similar domain name, ranks #5 for "audio books online" and #4 for "audiobooks online" while we've almost disappeared from Google rankings. By any measurement I am aware of, our site should rank higher than audiobooksonline.co.uk. Market Samurai reports for "audio books online" and "audiobooks online" shows that our Web store is significantly "stronger" than audiobooksonline.co.uk but they show up on Google's first page and we are down several pages. I also checked a few titles on audiobooksonline.co.uk and confirmed they are using the same publisher descriptions we and many other online book / audiobook merchants do = duplicate content. We have never received notice that our Web store was being penalized. Why would audiobooksonline.co.uk rank so much higher than audiobooksonline.com? Does Google treat non-USA sites different than USA sites?
Intermediate & Advanced SEO | | lbohen0 -
Traffic drop off and page isn't indexed
In the last couple weeks my impressiona and clicks have dropped off to about half what it used to be. I am wondering if Google is punishing me for something... I also added two new pages to my site in the first week of June and they still aren't indexed. In the past it seemed like new pages would be indexed in a couple days. Is there any way to tell if Google is unhappy with my site? WMT shows 3 server errors, 3 Access denied, and 122 not found errors. Could those not found pages be killing me? Thanks for any advise, Greg www.AntiqueBanknotes.com
Intermediate & Advanced SEO | | Banknotes0