Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filter Content By State Selection and SEO Consideratoins
I have an insurance client that is represented in three states. They need to present different information to users by state identification. They prefer to have one page with all the information and then present the information relevant to the state by the users selection from a pop up window. Spiders will be able to index all the content. Users will only see the content based on their selection. So, I wanted to ask the Moz community what SEO implication could this have? The information available on the web is very thin with this situation so really appreciate any guidance that can be given...thanks,
Intermediate & Advanced SEO | | Liamis0 -
Huge Spike in Organic/Direct traffic from Mexico
So here's my situation: My company's website usually receives around 80 organic visits/month and 50 direct visits/month from Mexico. However, in July we saw a small uptick to around 170 for each and then in the last 7 days we are in the middle of a massive spike which has put us up to 1400 visits for organic and 820 visits for direct in August. The traffic spike continues as we are almost up to 500 visits just today! Things to know: The visitors are purchasing from our store, staying on our site, browsing around, basically acting like real traffic. I was unable to identify any new links, press, and we did not do any specific Mexico optimization (spanish keywords). We sell a ball and it is called The One World Futbol, but it's always been called a futbol before so nothing new here. our website is www.oneworldplayproject.com. Everyone coming organically is searching our name, not keywords. We updated our shopping cart a few days before the massive traffic spike and significantly lowered the cost to ship to Mexico. Our Latin America director went to Mexico to work there for a month a few days before the spike and sent out a bunch of emails, texts, phone calls, what's app notifications to his large network. From what I am told by others here he has a vast network throughout Mexico, Central America and South America. We have also seen large traffic increases in other Latin American countries during this same time period just nothing like Mexico. We just hired an awesome social media coordinator who is extremely focused and is implementing a kick-ass social strategy We launched a branding campaign called #MakeLifePlayFull with press releases and ad spend behind it. PHEW! That was a lot of info for you to digest. So on the surface this seems like great news. BUT I want to understand WHY this is happening. Could it really just be the combination of all these things listed above or is it just a combination of our connected guy being in Mexico with better shipping costs? Why is it mainly happening in Mexico? Why is it so sustained? I suspect that if it is from our guy it would drop off quickly. Any thoughts on what to look at? I'm stumped.
Intermediate & Advanced SEO | | Eric_OWPP0 -
Traffic drop on this site
I am SEO'ing this site but need some assistance in the analysis. it was doing not too bad but in the last 4 months the google traffic has really fallen off, i suspect the keywords may need improving but any tips or observations would be great.
Intermediate & Advanced SEO | | crowng0 -
Onpage Reviews, SEO & Traffic Uplift
Hi I wondered if anyone knew of any case studies to reinforce the importance of on page reviews for SEO & increasing traffic. I'd like to push it in my company, however it would be great to show them some results from a case study. Thank you!
Intermediate & Advanced SEO | | BeckyKey1 -
Google Mobile algo traffic issue?
Hello, I have just been approach by a website owner - site isn't mobile friendly in any way - and they've seen a significant fall off in traffic since 23 Jan... backlink profile is clean (and no linkbuilding undertaken) - nothing else has changed... - more than half their traffic is via mobile devices and they've lost a good 1/3 of their traffic - and drilling deeper it's their organic traffic that's been hit. Anybody else seeing similar? edit... for reference: https://www.davidnaylor.co.uk/google-released-mobile-algorithm-think.html
Intermediate & Advanced SEO | | McTaggart0 -
Emergency duplicate of website due to DNS failure - how to minimise loss of search engine traffic?
Hi, Our client has had a disaster with their domain name registrar, where the DNS settings have been reset and it looks like the registrar won't be able to re-instate the DNS settings for four days time. This is a nightmare for lost business whilst the site and emails are offline. As a fallback, we've setup a copy of the client's website at an alternative domain name so that people can be directed there in the meantime via Facebook posts, etc. Is there anything you would recommend we do in the meantime to minimise the loss of traffic from search engines, and loss of reputation with Google? eg. using Google webmasters to tell Google about the change of address? Thank you.
Intermediate & Advanced SEO | | smaavie0 -
Declining Organic Traffic despite PR, links and engagement
I have a client site that launched last June and rebranded this February 2012 as http://49thshelf.com The search traffic since Feb has been steadily declining despite some great campaigns to drive traffic and engagement. April down 40% vs. Mar May down 37% Jun down 51% Jul 16% We have a couple of challenges. The site is the only collection of Canadian-authored titles. It's like an Amazon of only Canadian titles. But it's not ecommerce, we direct traffic to other vendors like Amazon or the publisher to buy. We have 40,000 unique products on the site and the descriptions are primarily supplied by the publishers, which means it's the same content on the publisher site as Goodreads, Amazon and anyone else they share data with. Those big players like Amazon and Goodreads use user generated content to alter the descriptions but we don't have that level of activity on the site. Members create reading lists, the editorial staff curate collections on the homepage and there are interviews, blog posts and guest posts. No black hat SEO, no bad links that I can see. Great organic membership growth and interactions. Good activity from social media sites to the site. Good, trusted links from news sites and legit blogs. I don't know what to do to improve the organic traffic. July is the first month that we haven't seen 40-50% drops. Any advice is welcome, thank you!
Intermediate & Advanced SEO | | SoMisguided0 -
Having Content be the First thing the bots see
If you have all of your homepage content in a tab set at the bottom of the page, but really would want that to be the first thing Google reads when it crawls your site, is there something you can implement where Google reads your content first before it reads the rest of your site? Does this cause any violations or are there any red flags that get raised from doing this? The goal here would just be to get Google to read the content first, not hide any content
Intermediate & Advanced SEO | | imageworks-2612900