Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Insane traffic loss and indexed pages after June Core Update, what can i do to bring it back?
Hello Everybody! After June Core Update was released, we saw an insane drop on traffic/revenue and indexed pages on GSC (Image attached below) The biggest problem here was: Our pages that were out of the index were shown as "Blocked by robots.txt", and when we run the "fetch as Google" tool, it says "Crawl Anomaly". Even though, our robots.txt it's completely clean (Without any disallow's or noindex rules), so I strongly believe that the reason that this pattern of error is showing, is because of the June Core Update. I've come up with some solutions, but none of them seems to work: 1- Add hreflang on the domain: We have other sites in other countries, and ours seems like it's the only one without this tag. The June update was primarily made to minimize two SERP results per domain (or more if google thinks it's relevant). Maybe other sites have "taken our spot" on the SERPS, our domain is considerably newer in comparison to the other countries. 2- Mannualy index all the important pages that were lost The idea was to renew the content on the page (title, meta description, paragraphs and so on) and use the manual GSC index tool. But none of that seems to work as well, all it says is "Crawl Anomaly". 3- Create a new domain If nothing works, this should. We would be looking for a new domain name and treat it as a whole new site. (But frankly, it should be some other way out, this is for an EXTREME case and if nobody could help us. ) I'm open for ideas, and as the days have gone by, our organic revenue and traffic doesn't seem like it's coming up again. I'm Desperate for a solution Any Ideas gCi46YE
Intermediate & Advanced SEO | | muriloacct0 -
Huge Spike in Organic/Direct traffic from Mexico
So here's my situation: My company's website usually receives around 80 organic visits/month and 50 direct visits/month from Mexico. However, in July we saw a small uptick to around 170 for each and then in the last 7 days we are in the middle of a massive spike which has put us up to 1400 visits for organic and 820 visits for direct in August. The traffic spike continues as we are almost up to 500 visits just today! Things to know: The visitors are purchasing from our store, staying on our site, browsing around, basically acting like real traffic. I was unable to identify any new links, press, and we did not do any specific Mexico optimization (spanish keywords). We sell a ball and it is called The One World Futbol, but it's always been called a futbol before so nothing new here. our website is www.oneworldplayproject.com. Everyone coming organically is searching our name, not keywords. We updated our shopping cart a few days before the massive traffic spike and significantly lowered the cost to ship to Mexico. Our Latin America director went to Mexico to work there for a month a few days before the spike and sent out a bunch of emails, texts, phone calls, what's app notifications to his large network. From what I am told by others here he has a vast network throughout Mexico, Central America and South America. We have also seen large traffic increases in other Latin American countries during this same time period just nothing like Mexico. We just hired an awesome social media coordinator who is extremely focused and is implementing a kick-ass social strategy We launched a branding campaign called #MakeLifePlayFull with press releases and ad spend behind it. PHEW! That was a lot of info for you to digest. So on the surface this seems like great news. BUT I want to understand WHY this is happening. Could it really just be the combination of all these things listed above or is it just a combination of our connected guy being in Mexico with better shipping costs? Why is it mainly happening in Mexico? Why is it so sustained? I suspect that if it is from our guy it would drop off quickly. Any thoughts on what to look at? I'm stumped.
Intermediate & Advanced SEO | | Eric_OWPP0 -
Website Indexing Issues - Search Bots will only crawl Homepage of Website, Help!
Hello Moz World, I am stuck on a problem, and wanted to get some insight. When I attempt to use Screaming Spider or SEO Powersuite, the software is only crawling the homepage of my website. I have 17 pages associated with the main domain i.e. example.com/home, example.com/sevices, etc. I've done a bit of investigating, and I have found that my client's website does not have Robot.txt file or a site map. However, under Google Search Console, all of my client's website pages have been indexed. My questions, Why is my software not crawling all of the pages associated with the website? If I integrate a Robot.txt file & sitemap will that resolve the issue? Thanks ahead of time for all of the great responses. B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Organic search traffic improved (besides Google) for last 6 months
Hi, to follow up on my previous post (http://moz.com/community/q/low-on-google-ranking-despite-error-free), I was wandering if someone can tell me whether we are penalised by Google or not? Since the last 6 months, we see a rise in organic visitors coming from Bing, yahoo but Google remains the same. Despite the advice given in previous post, I just feel that something else must be wrong. Perhaps more inbound links with high PR? Socially, we are pretty much engaging 50-60% of our audience, yet no link flow will count for our organic ranking sadly enough... Hopefully someone can have a look at our site www.mercadonline.es in more detail? Ask me in a PM for more info! Thank you Ivordg
Intermediate & Advanced SEO | | ivordg0 -
Organic search traffic dropped 40% - what am I missing?
Have a client (ecommerce site with 1,000+ pages) who recently switched to OpenCart from another cart. Their organic search traffic (from Google, Yahoo, and Bing) dropped roughly 40%. Unfortunately, we weren't involved with the site before, so we can only rely on the wayback machine to compare previous to present. I've checked all the common causes of traffic drops and so far I mostly know what's probably not causing the issue. Any suggestions? Some URLs are the same and the rest 301 redirect (note that many of the pages were 404 until a couple weeks after the switch when the client implemented more 301 redirects) They've got an XML sitemap and are well-indexed. The traffic drops hit pretty much across the site, they are not specific to a few pages. The traffic drops are not specific to any one country or language. Traffic drops hit mobile, tablet, and desktop I've done a full site crawl, only 1 404 page and no other significant issues. Site crawl didn't find any pages blocked by nofollow, no index, robots.txt Canonical URLs are good Site has about 20K pages indexed They have some bad backlinks, but I don't think it's backlink-related because Google, Yahoo, and Bing have all dropped. I'm comparing on-page optimization for select pages before and after, and not finding a lot of differences. It does appear that they implemented Schema.org when they launched the new site. Page load speed is good I feel there must be a pretty basic issue here for Google, Yahoo, and Bing to all drop off, but so far I haven't found it. What am I missing?
Intermediate & Advanced SEO | | AdamThompson0 -
If you remove a 301-re-direct, will there be a corresponding drop in traffic?
We built a better version of a search results page and re-directed from the old search results page to the landing page, and are seeing a huge uptick in traffic. Wondering if we remove the re-direct and 404 the original search results page if we'll see a drop in traffic. I ran the search results page through open site explorer and Google Webmaster tools, and there aren't many links, but the search results page used to see quite a bit of of traffic over the past couple of years.
Intermediate & Advanced SEO | | nicole.healthline0 -
Massive decreases in traffic
Hi i've been looking at the affects of googles algorithmic updates over the last couple years and the impact on sites/competitors i have been monitoring in the space. Two sites which surprised me, in having a dramatic decline in search traffic were: kriskris.com (over 200k visitors to around 10k) only-cookware.com (from 40k visitors at its peak to only around 1000k) (semrush traffic data attached) Both sites have great quality content and social signals. The only thing i can think of is a over-optimization of anchor text, and types of links. dnrm0Oa.png cuaLzrI.png
Intermediate & Advanced SEO | | monster990 -
Drop in Traffic on Friday April 20th.
Just curious if anyone noticed a drop in traffic last friday. I got hammered with about a 20% drop overall. Didn't know if there was an update or what. Thanks in advance!
Intermediate & Advanced SEO | | astahl110