Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
HELP!!! Steep Drop in Organic Traffic Starting 11/1/16
Starting November 1st, organic web traffic from Google dropped from an average of about 60 visits a day to about 5 per day. So we are more than 90% off!!!! At the end of September, we modified the header of the site to simplify it. We also added a snippet of code to each page to enable Zoho "Sales IQ" to work. Sales IQ enables us to track visitors and engage in chat sessions with them. Apart from that no changes have been made from the site. Any ideas as to what could have caused this drop in traffic? Was there a Google update at that time that could have caused the drop? Or could the recent site changes have caused this? I have attached a Google Webmasters Tool report showing the drop in traffic. I would very much appreciate some insight into this, as all organic traffic to our site has ceased. Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan 9VNB1O50 -
Technical SEO Issues - Traffic Drop
Hi guys, I hope you're all doing well! We're a small personalised gifts company who specialise in the provision of phone cases, mugs, macbook covers and the like. I head up the Digital Marketing but have little experience in the technical side of SEO and have very limited resources in terms of budget and staffing. Over the past few months, I've been working on stripping down the thin content on the site, fixing duplicate content issues and focusing on other digital channels to boost revenue. However, as of recent we've noticed a significant drop in traffic and our rankings. I've tried to diagnose the problem and I'm convinced there are some technical SEO fixes that need to be implemented. Our website is www.mrnutcase.com If any of you have any ideas, I'd love to hear some of them. Greatly appreciated, Danny
Intermediate & Advanced SEO | | DannyNutcase0 -
How to avoid adult traffic to site?
A client of ours is increasingly getting a lot of adult traffic to their site, where they show up only for adult searches and not at all for relevant searches. How can we stop Google associating their site with adult content? Here's a blog example, giving advice to parents on girls and body image issues: https://www.commonsensemedia.org/blog/girls-and-body-image keywords driving traffic to this page are all around images for 'young nude girls' etc.
Intermediate & Advanced SEO | | MediaCause0 -
Best-practice URL structures with multiple filter combinations
Hello, We're putting together a large piece of content that will have some interactive filtering elements. There are two types of filters, topics and object types. The architecture under the hood constrains us so that everything needs to be in URL parameters. If someone selects a single filter, this can look pretty clean: www.domain.com/project?topic=firstTopic
Intermediate & Advanced SEO | | digitalcrc
or
www.domain.com/project?object=typeOne The problems arise when people select multiple topics, potentially across two different filter types: www.domain.com/project?topic=firstTopic-secondTopic-thirdTopic&object=typeOne-typeTwo I've raised concerns around the structure in general, but it seems to be too late at this point so now I'm scratching my head thinking of how best to get these indexed. I have two main concerns: A ton of near-duplicate content and hundreds of URLs being created and indexed with various filter combinations added Over-reacting to the first point above and over-canonicalizing/no-indexing combination pages to the detriment of the content as a whole Would the best approach be to index each single topic filter individually, and canonicalize any combinations to the 'view all' page? I don't have much experience with e-commerce SEO (which this problem seems to have the most in common with) so any advice is greatly appreciated. Thanks!0 -
Traffic drop on this site
I am SEO'ing this site but need some assistance in the analysis. it was doing not too bad but in the last 4 months the google traffic has really fallen off, i suspect the keywords may need improving but any tips or observations would be great.
Intermediate & Advanced SEO | | crowng0 -
Ipad Sales & Traffic Improvement for my Ecommerce site
Do you guys know any tool or software which provides follow things for my ecommerce site? Real Time/ next day data for ipad traffic Real Time/ next day data for ipad urls visited Read time/ next day data for ipad Page rendering load time for all the urls separately Real Time/ next day data for ipad network load time for all the urls separately Real Time/ next day data for ipad dom processing time for the all the urls separately Real Time/ next day data for ipad request queuing load time for all the urls separtely Real Time/ next day data for ipad web application load time for all the urls separtely Real Time/ next day data for ipad total load time for each url Real Time/ Next day data for ipad timestamp i.e Time of each url being accessed by the visitor Real Time/ next day data for ipad visitor city Real Time/ next day data for ipad visitor country code Real Time/ next day data for ipad visitor duration on that page Real Time/ next day data for ipad visitor user agent name foreg chrome, IE, safari, firefox etc Real time/ next day data for ipad visitor user agent OS foreg. ipad only Real time/ next day data for ipad user agent version foreg. ipad 8.0, ipad 6.0, ipad air, ipad ratina, ipad mini etc Real time/ next day data for ipad visitor for each url session trace in water fall like backend time, dom processing, page load, waiting on ajax, interactions of visitors etc Real time/ next day data for ipad visitor for each url with total request for each page. Real time/ next day data for ipad visitors for each url with javascript error on the page and javascript url plus stake track of that error. Real time/ next day data for ipad visitors for each url with ajax error on the page and ajax url plus stake track of the error Real time/ next day data for ipad visitors for each and every url where each and every request time taken in waterfall layout. Real time/ next day data for ipad visitors funnel visiualization tracking Real time/ next day data for ipad visitors transcations tracking. Please note that all above data also require day wise, country wise, previous days and month, model wise sorting, pagination feature, etc. waiting for your reply Regards, Mit
Intermediate & Advanced SEO | | mit0 -
Duplicate content based on filters
Hi Community, There have probably been a few answers to this and I have more or less made up my mind about it but would like to pose the question or as that you post a link to the correct article for this please. I have a travel site with multiple accommodations (for example), obviously there are many filter to try find exactly what you want, youcan sort by region, city, rating, price, type of accommodation (hotel, guest house, etc.). This all leads to one invevitable conclusion, many of the results would be the same. My question is how would you handle this? Via a rel canonical to the main categories (such as region or town) thus making it the successor, or no follow all the sub-category pages, thereby not allowing any search to reach deeper in. Thanks for the time and effort.
Intermediate & Advanced SEO | | ProsperoDigital0 -
Can't seem to get traffic back post Panda / Penguin. WHY?
I have done and am doing everything I can think of to bring back lost traffic after the late 2012 updates from google hit us. I just is not working. We had some issues with our out of house web developers which screwed up our site in 2012 and after taking it in house we have Eden doing damage control form months now. We think we have fixed pretty much everything. URL structure filling up with good unique content(under way. Lots still to do) making better category descriptions redesigned homepage. Updated product pages (CMS is holding things back on that part otherwise they would be better. New CMS under construction) started more link building(its a real weak spot on our SEO as far as I can see) audited bad links from dodgy irelavent sites. hired writers to create content and link bait articles. Begun making high quality video's for both YouTube (brand awareness and viral) and on site hosting (link building and conversions) (in the pipeline not online yet). Flattened out site architecture. optimise internal link flow (got this wrong by using nofollows. In the process of thinking of a better way by reducing nun wanted Nav links on page.) i realise its not all done but I have been working ever since the drop in traffic and I'm just seeing no increase at all. I have been asking a few questions on here for the past few days but still can't put my finger on the issue. Am I just impatient and need to wait on the traffic as I am doing all the correct things? Or have I missed something and need to fix it. you anyone would like to have a quick look at my site and see if there is an obvious issue I have missed It would be great as I have been tearing my hair out trying to find the issues with my site. It's www.centralsaddlery.co.uk Criticism would me much appreciated.
Intermediate & Advanced SEO | | mark_baird0