Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
50% Organic Traffic Drop In the last 48 Hours
Hello, My site had a 50% decrease in the last 48 hours (9/26/18) and I looking for ideas/reasons what would cause such a dramatic drop. Year to year organic traffic has been up 40% and September was up 30%. The site has a domain authority of 39 according to Moz and keywords positions have been flat for a few months. I made a change to the code and robots.txt file on Monday, pre-drop. The category pagination pages had a "NoIndex" with a rel =canonical and I removed the "NoIdnex" per: https://www.seroundtable.com/google-noindex-rel-canonical-confusion-26079.html. I also removed "Disallow" in the robots.txt for stuff like "/?dir" because the pages have the rel =canonical. Could this be the reason for drop?? Other possible reasons:
Intermediate & Advanced SEO | | chuck-layton
1. Google Update: I dont think this is it, but ti looks like the last one was August 1st: "Medic" Core Update — August 1, 2018
2. Site was hacked
3. All of keyword positions dropped overnight: I dont think this is it because Bing has also dropped at the same percentage. Any help, thoughts or suggestions would be awesome.0 -
Will redirecting poor traffic web pages increase web presence
A number of pages on my site have low traffic metrics. I intend to redirect poor performing pages to the most appropriate page with high traffic. Example
Intermediate & Advanced SEO | | Mark_Ch
www.sampledomomain.co.uk/low-traffic-greyshoes
www.sampledomomain.co.uk/low-traffic-greenshoes
www.sampledomomain.co.uk/low-traffic-redshoes all of the above will be redirected to the following page:
www.sampledomomain.co.uk/high-traffic-blackshoes Question
Will carrying out htaccess redirects from the above example influence to web positioning of both www.sampledomomain.co.uk/high-traffic-blackshoes and www.sampledomomain.co.uk Regards Mark0 -
Lost 86% of traffic after moving old static site to WordPress
I hired a company to convert an old static website www.rawfoodexplained.com with about 1200 pages of content to WordPress. Four days after launch it lost almost 90% of traffic. It was getting over 60,000 uniques while nobody touched the site for several years. It’s been 21 days since the WordPress launch. I read a lot of stuff prior to moving it (including Moz's case study) and I was expecting to lose in short term 30% of traffic max… I don’t understand what is wrong. The internal link structure is the same, every url is 301 to the same url only without[dot]html (ie www.rawfoodexplained.com/science.html is 301′s to http://www.rawfoodexplained.com/science/ ), it’s added to Google Webmaster tool and Google indexed the new pages… Any ideas what could be possible wrong? I do understand the website is not optimized (meta descriptions etc, but it wasn't before either) .... Do you think putting back the old site would recover the traffic? I would appreciate any thoughts Thank you
Intermediate & Advanced SEO | | JakubH0 -
What to do after a sudden drop in traffic on May 8?
Hello, I own Foodio54.com, which provides restaurant recommendations (mostly for the US). I apologize in advance for the lengthy questions below, but we're not sure what else to do. On May 8 we first noticed a dip in Google results, however the full impact of this sudden change was masked by an increase in Mother's Day traffic and is only today fully apparent. It seems as though we've lost between 30% and 50% of our traffic. We have received no notices in Google Webmaster Tools of any unnatural links, nor do we engage in link buying or anything else that's shady, and have no reason to believe this is a manual action. I have several theories and I was hoping to get feedback on them or anything else that anyone thinks could be helpful. 1. We have a lot of pictures of restaurants and each picture has its own page and these pages aside from the image are very similar. I decided to put a noindex,follow on the picture pages (just last night) especially considering Google's recent changes to image search that send less traffic anyways. Is there any way to remove these faster? There's about 3.5 million of them. I was going to exclude them in robots.txt, but that won't help the ones that are already indexed. Example Photo Page: http://foodio54.com/photos/trulucks-austin-2143458 2. We recently (within the last 2 months) got menu data from SinglePlatform, which also provides menus to UrbanSpoon and Yelp and many others, we were worried that adding a page just for menus that was identical to what is on Urbanspoon et all would just be duplicate content so we added these inline with our listing pages. We've added menus on about 200k listings.
Intermediate & Advanced SEO | | MikeVH
A. Is Google considering this entire listing page duplicate content because the menu is identical to everyone else?
B. If it is, should we move the menus to their own pages and just exclude them with robots.txt? We have an idea on how to make these menus unique for us, but it's going to be a while before we can create enough content to make that worthwhile. Example Listing with Menu: http://foodio54.com/restaurant/Austin-TX/d66e1/Trulucks 3. Anything else? Thank you in advance. Any insight anyone in the community has would be greatly appreciated. --Mike Van Heyde0 -
I need help with a local tax lawyer website that just doesn't get traffic
We've been doing a little bit of linkbuilding and content development for this site on and off for the last year or so: http://www.olsonirstaxattorney.com/ We're trying to rank her for "Denver tax attorney," but in all honesty we just don't have the budget to hit the first page for that term, so it doesn't surprise me that we're invisible. However, my problem is that the site gets almost NO traffic. There are days when Google doesn't send more than 2-3 visitors (yikes). Every site in our portfolio gets at least a few hundred visits a month, so I'm thinking that I'm missing something really obvious on this site. I would expect that we'd get some type of traffic considering the amount of content the site has, (about 100 pages of unique content, give or take) and some of the basic linkbuilding work we've done (we just got an infographic published to a few decent quality sites, including a nice placement on the lawyer.com blog). However, we're still getting almost no organic traffic from Google or Bing. Any ideas as to why? GWMT doesn't show a penalty, doesn't identify any site health issues, etc. Other notes: Unbeknownst to me, the client had cut and pasted IRS newsletters as blog posts. I found out about all this duplicate content last November, and we added "noindex" tags to all of those duplicated pages. The site has never been carefully maintained by the client. She's very busy, so adding content has never been a priority, and we don't have a lot of budget to justify blogging on a regular basis AND doing some of the linkbuilding work we've done (guest posts and infographic).
Intermediate & Advanced SEO | | JasonLancaster0 -
Url structure for multiple search filters applied to products
We have a product catalog with several hundred similar products. Our list of products allows you apply filters to hone your search, so that in fact there are over 150,000 different individual searches you could come up with on this page. Some of these searches are relevant to our SEO strategy, but most are not. Right now (for the most part) we save the state of each search with the fragment of the URL, or in other words in a way that isn't indexed by the search engines. The URL (without hashes) ranks very well in Google for our one main keyword. At the moment, Google doesn't recognize the variety of content possible on this page. An example is: http://www.example.com/main-keyword.html#style=vintage&color=blue&season=spring We're moving towards a more indexable URL structure and one that could potentially save the state of all 150,000 searches in a way that Google could read. An example would be: http://www.example.com/main-keyword/vintage/blue/spring/ I worry, though, that giving so many options in our URL will confuse Google and make a lot of duplicate content. After all, we only have a few hundred products and inevitably many of the searches will look pretty similar. Also, I worry about losing ground on the main http://www.example.com/main-keyword.html page, when it's ranking so well at the moment. So I guess the questions are: Is there such a think as having URLs be too specific? Should we noindex or set rel=canonical on the pages whose keywords are nested too deep? Will our main keyword's page suffer when it has to share all the inbound links with these other, more specific searches?
Intermediate & Advanced SEO | | boxcarpress0 -
Sudden drop in ranks and traffic after migrating community website into main domain
Hi, We recently moved our community website (around 50K web pages) to our main domain. It now resides as a sub-domain on our main website. e.g. Before - we had www.mainwebsite.com and www.communitywebsite.com After - we have www.communitywebsite.mainwebsite.com This change took place on July 19th. After a week, we saw 16% drop in organic traffic to mainwebsite.com. Our ranks on most of the head keywords including brand keywords have dropped. We had created 301 redirects from pages on www.communitywebsite.com before this change was made. Has anybody seen this kind of impact when domains are merged? Should we expect that within 3-4 weeks Google will be able to re-index and re-rank all the pages? Is there anything else we could do to rectify the situation? Any feedback/suggestions are welcome!
Intermediate & Advanced SEO | | Amjath0 -
With Panda, which is more important, traffic or quantity?
If you were to prioritize how to fix a site, would you focus on traffic or quantity of urls? So for example, if 10% of a site had thin content, but accounted for 50% of the traffic and 50% of the site had a different type of thin content but only accounted for 5% of organic traffic, which would you work on first? I realize both need to be fixed, but am unsure of which to tackle first (this is an extremely large site). Also, I am wondering if the simply the presence of thin content on a domain can affect a site even if it isn't receiving any traffic.
Intermediate & Advanced SEO | | nicole.healthline0