Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filter Content By State Selection and SEO Consideratoins
I have an insurance client that is represented in three states. They need to present different information to users by state identification. They prefer to have one page with all the information and then present the information relevant to the state by the users selection from a pop up window. Spiders will be able to index all the content. Users will only see the content based on their selection. So, I wanted to ask the Moz community what SEO implication could this have? The information available on the web is very thin with this situation so really appreciate any guidance that can be given...thanks,
Intermediate & Advanced SEO | | Liamis0 -
Over 30,000 pages but only 100 get traffic... can I kill the others?
I have a website with over 30,000 pages. But only around 100 are getting traffic from Google/being used by the company. How safe is it for me to kill the other pages? Usually I'd do rel canonical or 301's to scrap as much link juice as I can from them, but at 30,000 we just don't have any place to 301 the pages that makes sense and rel canonical to irrelevant pages seems... wrong? So my hope was to just kill the pages, reuse their content when needed, but pretty much start fresh. Let me know your thoughts. Thanks,
Intermediate & Advanced SEO | | jacob.young.cricut0 -
Whats the best way to implement rel = “next/prev” if we have filters?
Hi everyone, The filtered view results in paginated content and has different urls: example: https://modli.co/dresses.html?category=45&price=13%2C71&size=25 Look at what it says in search engine land: http://searchengineland.com/implementing-pagination-attributes-correctly-for-google-114970 Look at Advanced Techniques paragraph. do you agree? it seem like google will index the page multiple times for every filter variant. Thanks, Yehoshua
Intermediate & Advanced SEO | | Yehoshua0 -
What are your best moves if you want to get your traffic and rankings back for a specific keyword?
Hi all We are server and website monitoring company for over 13 years and I dare to say our product evolved and mastered over the years. Our marketing not so much. Most of our most convertible traffic came from the keyword "ping test" with our ping test tool page, and for the first 10 years we have been positioned 1-3 in Google.com so it was all good. The last two years we have been steady on positioned 8-9, and since 7-30-13 we are on the second page. We have launched a blog in 2009 at http://www.websitepulse.com/blog, and post 2-3 times a week, and are working on new website now, and my question is what is your advice in our situation? Aside from providing fresh content and launching a new website is there anything specific we could do at this stage to improve our position for "ping test"? Thanks Lily
Intermediate & Advanced SEO | | wspwsp0 -
Block search bots on staging server
I want to block bots from all of our client sites on our staging server. Since robots.txt files can easily be copied over when moving a site to production, how can i block bots/crawlers from our staging server (at the server level), but still allow our clients to see/preview their site before launch?
Intermediate & Advanced SEO | | BlueView13010 -
Keyword search filter in Google Adwords: broad? exact? phrase?
Hello all I am working in my website and analysing the potential best keywords for the SEO (post/page name and url path name). 1. I am using Google Adwords. Any other tool you would recommend? 2. Which selection should I make in the Google Adwords Keyword Tool in order to know the monthly global searches of the keywords I should target? Exact? Phrase? Broad? For instance, KEYWORD SEARCH:"Information about Madrid" BROAD MATCH: 300,000 EXACT MATCH: 1,500 Te potential of the keyword is 300,000? 300,000 searches are undertaken on a month that contains that sentence and its variations? Or the relevant keyword potential is the exacta match traffic? Thank you very much! Antonio
Intermediate & Advanced SEO | | aalcocer20030 -
Planning for Website traffic on a self-hosted web server
How would you plan for the levels of traffic on an in-house web server? The scenario is that website is basically running on a T1 (1.5 Mbps) connection pipe, and traffic projects to increase significantly with content growing from about 40 uniques a day (on less than 20 poorly optimized web pages + associated PDF documents), to over 150 search optimized content pages + offsite traffic and link building. I'm trying to figure out what kinds of avg traffic levels (plus spikes) would represent a maximum bandwidth capacity for this...given that its a narrow specialty B2B focus. Any answers would be useful.
Intermediate & Advanced SEO | | GLogic0 -
How to best utilize network of 50 sites to increase traffic on main site
Hey All, First off I wanna thank everyone who has responded to all my previous questions! Love to see a community that is so willing to help those who are learning the ropes! Anyways back to my point. We have a main site that is a PR 3 and our main focal point for lead generation. We recently acquired 50 additional sites (all with a PR of 1-3) that we would like to use as our own little back linking campaign with. All the domains are completely relevant to our main site as well as specific pages within our main site. I know that reciprocal links will get me no where and that google is quickly on to the attempted 3 way link exchange. My question is how do I best link these 50 sites to not only maintain there own integrity and PR but also assist our main site. Thanks All!
Intermediate & Advanced SEO | | deuce1s0