Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to avoid adult traffic to site?
A client of ours is increasingly getting a lot of adult traffic to their site, where they show up only for adult searches and not at all for relevant searches. How can we stop Google associating their site with adult content? Here's a blog example, giving advice to parents on girls and body image issues: https://www.commonsensemedia.org/blog/girls-and-body-image keywords driving traffic to this page are all around images for 'young nude girls' etc.
Intermediate & Advanced SEO | | MediaCause0 -
What will happen if we 302 a page that is ranking #1 in google for a high traffic term?
We're planning to test something and we want to 302 a page to another page for a period of time. The question is, the original page is ranking #1 for a high traffic term. I want to know what will happen if we do this? Will we lose our rank? Will the traffic remain the same? Ultimately I do not want to lose traffic and I do not want to 301 until it has been properly tested.
Intermediate & Advanced SEO | | maxcdn0 -
Domain Migration of high traffic site:
We plan to perform a domain migration in 6 months time.
Intermediate & Advanced SEO | | lcourse
I read the different articles on moz relating to domain migration, but some doubts remain: Moving some linkworthy content upfront to new domain was generally recommended. I have such content (free e-learning) that I could move already now to new domain.
Should I move it now or just 2 months before migration?
Should I be concerned whether this content and early links could indicate to google a different topical theme of the new domain ? E.g. in our case free elearning app vs a commercial booking of presential courses of my core site which is somehow but not extremely strongly related) and links for elearning app may be very specific from appstores and from sites about mobile apps. we still have some annoying .php3 file extensions in many of our highest traffic pages and I would like to drop the file-extension (no further URL change). It was generally recommended to minimize other changes at the same time of domain migration, but on the other hand implementing later another 301 again may also not be optimum and it would save time to do it all at the same time. Shall I do the removal of the file extension at the same time of the domain migration or rather schedule it for 3 months later? On the same topic, would the domain migration be a good occasion to move to https instead of http at the same time, or also should we rather do this at a different time? Any thoughts or suggestions?0 -
Wordpress to HubSpot CMS - I had major crawl issues post launch and now traffic is down 400%
Hi there good looking person! Our traffic went from 12k visitors in july to 3k visitors in july. << www.thedsmgroup.com >>When we moved our site from wordpress to the hubspot COS (their CMS system), I didnt submit a new sitemap to google webmaster tools. I didn't know that I had to... and to be honest, I've never submitted or re-submitted a sitemap to GWT. I have always built clean sites with fresh content and good internal linking and never worried about it. Yoast kind of took care of the rest, as all of my sites and our clients' sites were always on wordpress. Well, lesson learned. I got this message on June 27th in GWT_http://www.thedsmgroup.com/: Increase in not found errors__Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages._One month after our site launched we had 1,000 404s on our website. Ouch. Google thought we had a 1,200 page website with only 200 good pages and 1,000 error pages. Not very trust worthy... We never had a 404 ever before this, as we added a plugin to wordpress that would 301 any 404 to the homepage, so we never had a broken link on our site, which is not ideal for UX, but as far as google was concerned, our site was always clean. Obviously I have submitted a new sitemap to GWT a few weeks ago, and we are moving in the right direction... **but have I taken care of everything I need to? I'm not sure. Our traffic is still around 100 visitors per day, not 400 per day as it was before we launched the new site.**Thoughts?I'm not totally freaking out or anything, but a month ago we ranked #1 and #2 for "marketing agency nj", now we aren't in the top 100. I've never had a problem like this. _I added a few screen grabs from Google Webmaster Tools that should be helpful.__Bottom line, have I done everything I need to or do I need to do something with all of these "not found" error details that I have in GWT?_None of these "not found" pages have any value and I'm not sure how Google even found them... For example: http://www.thedsmgroup.com/supersize-page-test/screen-shot-2012-11-06-at-2-33-22-pmHelp! -JasonuhLLtou&h4QmGCW#0 uhLLtou&h4QmGCW#1
Intermediate & Advanced SEO | | Charlene-Wingfield0 -
Brand traffic moved from organic to PPC - could it affect rankings?
Hi, We've just increased a lot of branded PPC clicks for one of our clients. I've worked out that roughly 5000 clicks per month has been moved from organic search to PPC (all brand related search queries). These clicks are very cheap, but the client has expressed worries about what these clicks could do to our organic rankings. Lots of brand search in organic results proves to Google that this is a strong brand, right? So what happens when all the searches are still there, but the organic listings stop getting the clicks? Could this have a ring effect on other non-brand rankings?
Intermediate & Advanced SEO | | Inevo0 -
Almost no organic traffic
Hi, We have an online store, it is up & running since January 1st. Since then we really didn't see any improvements on our organic traffic at all. About 10% of our traffic is coming from organic search, and more than 20% of organic search actually coming from branded keywords. We haven't paid a lot of attention to SEO so far. I mean, we paid attention to the practices, however we focused on a better customer/user experience more than SEO. We improved our product pages, reduced checkout process to one step, used bigger icons / buttons. According to our customers, our website is pretty easy to navigate and shop. We haven't received any major complaint so far. Except couple of products, all the content we have is original, we didn't use any manufacturer product content or copied from another website. However, looks like all these efforts don't mean a lot to Google, unless we have a solid backlinks. Currently i am considering to make category pages NOINDEX and implement microdata from schema.org. However, Is it good idea to make category pages NOINDEX for an ecommerce website? I would like to hear your comments/recommendations what else we can do to create some organic traffic.
Intermediate & Advanced SEO | | serkie0 -
Sudden Index drop, but traffic increased?
Here are the numbers- Pages submitted on sitemap- About 18k Total Pages indexed on 12/30- About 250k Total Pages indexed on 1/6- About 81k We made no site changes in that week, why the sudden drop? Also why is total pages indexed so much higher than sitemap?
Intermediate & Advanced SEO | | EcommerceSite0 -
Google Filter? Drop from top first page to bottom second page?
My site has dropped from the first page top spots to the bottom second page, about 2 month ago. From time to time it reappears in the first page, is this some kind of google filter? How do I solve this issue?
Intermediate & Advanced SEO | | Ofer230