Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does traffic for branded searches help a site rank for general terms?
A year or two ago we put up some websites which were specific to brands we own. Sure enough those sites (eg 'myBrand.com') started to rank pretty well for those brand terms eg 'mybrand curling tongs' (it's not curling tongs, btw, but you get the idea). We were getting a decent amount of traffic presumably from people who have bought or seen these products on our amazon/ebay stores. Before long, we see us starting to rank well for non branded searches eg 'curling tongs' even among decent competition. Next thing you know I'm getting told by the boss that we need to put up websites for all specific ranges, not just brands, because specificity is a bonus for ranking well. While there's probably a point that a site for MybrandCurlingTongs lends itself well to ranking for curling tongs, is there also an element that the branded searches we got (via making our brand known on amazon/ebay) helped the site gain recognition and authority? As such a new website about 'ionising hair dryers' would not rank well based on being specific, because it wouldn't be helped by a lot of branded traffic?
Intermediate & Advanced SEO | | HSDOnline2 -
Lot of duplicate content and still traffic is increasing... how does it work?
Hello Mozzers, I've a dilemma with a client's site I am working on that is make me questioning my SEO knowledge, or the way Google treat duplicate content. I'll explain now. The situation is the following: organic traffic is constantly increasing since last September, in every section of the site (home page, categories and product pages) even though: they have tons of duplicate content from same content in old and new URLs (which are in two different languages, even if the actual content on the page is in the same language in both of the URL versions) indexation is completely left to Google decision (no robots file, no sitemap, no meta robots in code, no use of canonical, no redirect applied to any of the old URLs, etc) a lot (really, a lot) of URLs with query parameters (which brings to more duplicated content) linked from the inner page of the site (and indexed in some case) they have Analytics but don't use Webmaster Tools Now... they expect me to help them increase even more the traffic they're getting, and I'll go first on "regular" onpage optimization, as their title, meta description and headers are not optimized at all according to the page content, but after that I was thinking on fixing the issues with indexation and content duplication, but I am worried I can "break the toy", as things are going well for them. Should I be confident that fixing these issues will bring to even better results or do you think is better for me to focus on other kind of improvements? Thanks for your help!
Intermediate & Advanced SEO | | Guybrush_Threepw00d0 -
Loss of traffic due to domain move, not recovering
I have a new client who this year chose to eliminate using a "stronger", older domain (domain authority 50) for a newer, weaker domain (domain authority 38). The redirects actually started end of 2013 and happened over time by page/section. All were completed by Jan 12 2014. While 301 redirects are in place, and the robots.txt is disallowing all (187 pages blocked), it looks as though Google is still indexing pages (149 indexed) although not sure why. Perhaps they should be removed from the server? In spite of the redirects, they are not getting the (combined) traffic expected. Should they have had that expectation? Could it be because they are going from a "stronger", long established domain to a "weaker", newer domain, that it may take a long time to recover? They recently had another agency review the links on the weaker domain and they submitted a file to Google to disavow the links they found to be "toxic" however it doesn't seem to have made any difference, yet. Any idea how long it "should" take to make a difference, if it will indeed make a difference? They do have a blog in a sub-directory that doesn't get much traffic (approx 0.50% of the total traffic). Every post ends with a blatant self-promotion and due to Penguin, they have recently begun to mix up their link text and not include a link on every post. Last their target audience is both B-B and B-C, with B-B being priority. The big question I have is do you see changes take place with almost instant results in Google? Or am I right in telling him, this will take some time. He feels it's been almost 4 months now and their visibility/traffic should be more in par with what it was combined. Something to note is that they were sort of competing with themselves by using both domains however the number of searchers probably hasn't changed much... Thank you so much for giving me your 2 cents!
Intermediate & Advanced SEO | | cindyt-17038
xo0 -
Dramatic decline in traffic with same unchanged rankings
Hello I would be grateful for any input on this. I'm the webmaster of the site.. -> www.worktopfactory.co.uk Before May 22, 2013, penguin 2 updates, i was getting around 700 - 800 Unique hits per day After pengin 2 Updates, There is no difference In ranking... But my traffic has halved Saturday for example the only received 66 hits. Please check my ranking stats Total Keywords 300 Rankings 220 In Top 3 288 On First Page 6. But traffic stats is Week ending: 6/16 Change 6/23 6/16 Change 6/23 6/16 Change 6/23
Intermediate & Advanced SEO | | JaffeyApple
Organic Search Visits
Total number of organic (unpaid) visits to your site from search engines.
1,782 -11% 1,589 37 -16% 31 1,745 -11% 1,558
URLs Receiving Entrances Via Search
The number of distinct URLs on your site that receive one or more organic (unpaid) visits from a search engine.
370 -4% 354 8 13% 9 362 -5% 345
Non-Paid Keywords Sending Search Visits
The number of distinct keywords that send one or more organic (unpaid) visits to your site.
886 -2% 865 8 0% 8 878 -2% 857 My questions are 1. Why is there a major decline in traffic when ranking is more orless same 2. What is the possible solution? 3. Am I targeting wrong keywords? If so, what would the alternatives be? Please note the 300 I have inserted were simply cut and pasted from a list of 1103 targeted kws. I would be grateful for any suggestions, so I may get traffic back to where it was before. Thanks0 -
I need help with a local tax lawyer website that just doesn't get traffic
We've been doing a little bit of linkbuilding and content development for this site on and off for the last year or so: http://www.olsonirstaxattorney.com/ We're trying to rank her for "Denver tax attorney," but in all honesty we just don't have the budget to hit the first page for that term, so it doesn't surprise me that we're invisible. However, my problem is that the site gets almost NO traffic. There are days when Google doesn't send more than 2-3 visitors (yikes). Every site in our portfolio gets at least a few hundred visits a month, so I'm thinking that I'm missing something really obvious on this site. I would expect that we'd get some type of traffic considering the amount of content the site has, (about 100 pages of unique content, give or take) and some of the basic linkbuilding work we've done (we just got an infographic published to a few decent quality sites, including a nice placement on the lawyer.com blog). However, we're still getting almost no organic traffic from Google or Bing. Any ideas as to why? GWMT doesn't show a penalty, doesn't identify any site health issues, etc. Other notes: Unbeknownst to me, the client had cut and pasted IRS newsletters as blog posts. I found out about all this duplicate content last November, and we added "noindex" tags to all of those duplicated pages. The site has never been carefully maintained by the client. She's very busy, so adding content has never been a priority, and we don't have a lot of budget to justify blogging on a regular basis AND doing some of the linkbuilding work we've done (guest posts and infographic).
Intermediate & Advanced SEO | | JasonLancaster0 -
Our site is recieving traffic for both .com/page and .com/page/ with the trailing slash.
Our site is recieving traffic for both .com/page and .com/page/ with the trailing slash. Should we rewrite to just the trailing slash or without because of duplicates. The other question is, if we do a rewrite, google has indexed some pages with the slash and some without - i am assuming we will lose rank for one of them once we do the rewrite, correct?
Intermediate & Advanced SEO | | Profero0 -
Traffic from twitter
Regarding traffic from twitter, I want to track this url - http://www.ultraseo.com/white-hat-vs-black-hat/ and generated the following URL using URL builder - http://www.ultraseo.com/white-hat-vs-black-hat/?utm_source=Twitter&utm_medium=Social%2Bmedium&utm_campaign=seo Should i now pass it through a URL shortener bitLy or googles own. and tweet it. Is this what i should do ? Please reply ... where will i get a report in GA ( under which heading ? )
Intermediate & Advanced SEO | | seoug_20050 -
With Panda, which is more important, traffic or quantity?
If you were to prioritize how to fix a site, would you focus on traffic or quantity of urls? So for example, if 10% of a site had thin content, but accounted for 50% of the traffic and 50% of the site had a different type of thin content but only accounted for 5% of organic traffic, which would you work on first? I realize both need to be fixed, but am unsure of which to tackle first (this is an extremely large site). Also, I am wondering if the simply the presence of thin content on a domain can affect a site even if it isn't receiving any traffic.
Intermediate & Advanced SEO | | nicole.healthline0