Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using Similar Expired URLs to Send Traffic to My Site
Thanks in advance for any help! I have an existing website with content on a particular topic. I have discovered a few similar expired URLs that might still get some traffic. One in particular still has a number of valid links from other sites. Would it make sense for me to buy those URLs (which are really cheap) and just use them to send that traffic to my site? If so, am I better using a 301 redirect or having a home page on the new site that just mentions that the old site is expired, and that they might want to instead link over to my site?
Intermediate & Advanced SEO | | alanjosephs0 -
Google Manual Penalty Lifted - Why is my website still decreasing on traffic?
Hi there, I was hoping that somebody has a potential answer to this or if anyone else has experienced this issue. Our website has recently hit by a manual penalty (structured data wasn't matching the content on the page) After working hard on this to fix the issue across the site, we submitted a reconsideration request which was approved by Google a few days later. I understand that not all websites recover and it doesn't guarantee rankings will go back to normal, but it seems as if the traffic is continuing to drop at an even quicker rate. There's a number of small technical optimisations that have been briefed into the dev team such as: Redirecting duplicate versions, fixing redirects on internal links, There's also work on-page running in the background fixing up keyword cannibalization, consolidating content keyword mapping and ensuring the internal link structure is sound. Has this happened to anyone else before? If so, how did you recover? Any suggestions/advice would be really appreciated. Thank you
Intermediate & Advanced SEO | | dbutler9120 -
Optimising Shopify Filtered Pages
Hi Guys, Currently working with a couple Shopify ecommerce sites, currently the main category urls cannot be optimised for SEO as they are auto-generated and basically filtered pages. Examples: http://tinyurl.com/hm7nm7p http://tinyurl.com/zlcoft4 One solution we have came up with is to create HTML based pages for each of these categories example: http://site.com.au/collections/women-sandals In the backend and keep the filtered page setup. So these pages can be crawled and indexed. I was wondering if this is the most viable solution to this problem for Shopify? Cheers.
Intermediate & Advanced SEO | | jayoliverwright0 -
Website Indexing Issues - Search Bots will only crawl Homepage of Website, Help!
Hello Moz World, I am stuck on a problem, and wanted to get some insight. When I attempt to use Screaming Spider or SEO Powersuite, the software is only crawling the homepage of my website. I have 17 pages associated with the main domain i.e. example.com/home, example.com/sevices, etc. I've done a bit of investigating, and I have found that my client's website does not have Robot.txt file or a site map. However, under Google Search Console, all of my client's website pages have been indexed. My questions, Why is my software not crawling all of the pages associated with the website? If I integrate a Robot.txt file & sitemap will that resolve the issue? Thanks ahead of time for all of the great responses. B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Domain Migration of high traffic site:
We plan to perform a domain migration in 6 months time.
Intermediate & Advanced SEO | | lcourse
I read the different articles on moz relating to domain migration, but some doubts remain: Moving some linkworthy content upfront to new domain was generally recommended. I have such content (free e-learning) that I could move already now to new domain.
Should I move it now or just 2 months before migration?
Should I be concerned whether this content and early links could indicate to google a different topical theme of the new domain ? E.g. in our case free elearning app vs a commercial booking of presential courses of my core site which is somehow but not extremely strongly related) and links for elearning app may be very specific from appstores and from sites about mobile apps. we still have some annoying .php3 file extensions in many of our highest traffic pages and I would like to drop the file-extension (no further URL change). It was generally recommended to minimize other changes at the same time of domain migration, but on the other hand implementing later another 301 again may also not be optimum and it would save time to do it all at the same time. Shall I do the removal of the file extension at the same time of the domain migration or rather schedule it for 3 months later? On the same topic, would the domain migration be a good occasion to move to https instead of http at the same time, or also should we rather do this at a different time? Any thoughts or suggestions?0 -
301 Redirect? How to leverage the traffic on our old domain.
I've seen multiple questions about this but there's a few different answers on ways to approach it. Figured I'd personally ask for our situation. Any advice would be appreciated. We formed a new company with a new name / domain while at the same time buying an existing company in our industry. The domain and site of the company we acquired is ranking for some valuable keywords and still getting a significant amount of traffic (about half of what our new site is getting). A big downside has been, when they moved that site to a different server, something happened to where the site became uneducable so it's full of bad pricing and information. Because of that, we've had a maintenance page up for a little bit because it was generating calls to our sales team (GOOD) but the customer was having seen incredibly incorrect information (BAD) Rather than correcting those issues or figuring out why the site is un-editable, we just want to find a way where we can leverage that traffic and have them end up at our new site. Would we 301 redirect the entire domain to our new one? If we did that would the old domain still keep the majority of it's page rank?
Intermediate & Advanced SEO | | HuskyCargo1 -
Canonical Tags & Search Bots
Does anyone know for sure if search engine bots still crawl links on a page whose canonical tags are set to a different page? So in short, would it be similar to a no-index follow? Thanks! -Margarita
Intermediate & Advanced SEO | | MargaritaS0 -
Ranking & Traffic drops in last month
Over the last month, our rankings have been in a slow slide - that is until this week, when they absolutely crashed. Here are some example phrases: Phrase 11-Mar 5-Mar bug shields 24 9
Intermediate & Advanced SEO | | ShawnHerrick
floor mats 25 14
nerf bars 23 12
running boards 61 14
snow plows 25 18 For the life of me, I can't see what would have caused such drastic changes. Our site is almost completely unique content. Some things, like Warranty & Install instructions, are from the manufacturer to protect us from liabilities. We come up with our own feature text, and we have custom written articles, blog posts, research guides, etc. We also appear to be the only one of our competitors being affected in this fashion. Any thoughts would be helpful. Domain is realtruck.com.0