Cloud WAF Advanced Bot Protection (ABP) Transcript and Video from Webinar

By Christopher Detzel posted 13 days ago

  

In this community webinar, Brooks Cunningham Manager, Service Operations, took us on an inside view of:
1. Overview of Cloud WAF reporting
2. Overview of configuring the policies
3. Overview of debugging



Chris Detzel: (00:16)
Thank you, everyone for attending today's webinar. I'm going to go ahead and share my screen. Like I said before, if you weren't on, I only have two slides, Brooks has no slides, it's all demo and going to... Showing you how to do some cool things. So, my name is Chris Detzel, and I am the community manager here at Imperva. If you can't tell my background, Imperva Community, just got that up and running yesterday. So, I'm pretty excited about the background. And then, Brooks Cunningham, he is the star of today. He is the manager of service operations, and we'll be talking about Cloud WAF Advanced Bot Protection.

Chris Detzel: (00:57)
About a year ago, Brooks, maybe, Imperva acquired a company called Distil, and this webinar's a little bit about the integration of our bot protection, [inaudible 00:01:11] Distil product, into our cloud WAF.

Chris Detzel: (01:13)
All right, so he'll correct me if I'm wrong about anything, I promise you. So today, just to put up some of the ground rules, keep your audio on mute unless you have a question. I do like to get questions on the chat. And so during this conversation, what I'll do is if I see chats in the box, then I'll ask during the webinar. So I'll just interrupt Brooks when it's appropriate and then asking those questions that come up. Additionally, if you didn't know, if you haven't been to community.imperva.com, please go there, it is more than just a discussion, but it is a place where you can ask questions. I would ask you, let me take this down over to your right hand side, go to community today and then take our survey. I am looking to have the best possible webinars, so we would like you to vote.

Chris Detzel: (02:14)
If I didn't put the webinar on there that you liked the most could be something different, like product updates and things like that, let me know directly. If you hit go vote it is confidential on who voted and things like that, so please do that. Also ask your questions. The webinar, again, will be recorded and shared to the community. Give me about a week or so and we'll make sure that that happens. I am also looking for other speakers, so if you are an expert in either cloud WAF, or our database, or on prem products would love to have you speak if all possible. And if you're interested in doing any of that, please email me directly, christopher.detzel@imperva.com. So I'm going to turn this now over to Brooks, but remember, don't feel bad about asking your questions in the chat and I'll get those, and then I think that's it, Brooks, take it away.

Brooks Cunningham: (03:21)
Awesome. Well, thank you so much for the introduction and thank you everybody for joining and participating today, like Chris said, I'm a member of the service operations team and responsible for more of the bot support. So today I'm going to be talking about a number of things, including an overview of some of the dashboards and the reporting that's available, how you can use some of the reporting that is available to take an action on automation, and then how you can investigate types of automation as well. But before I get into the specifics of the reporting, I just want to give an overview of what the Cloud WAF Advanced Bot Protection platform is.

Chris Detzel: (03:57)
All right, Brooks, take it away, my apologies.

Brooks Cunningham: (03:59)
Sure, no worries. Yeah. If I need to dial it on my phone or something, just let me know and I can make adjustments. So the Cloud WAF Advanced Bot Protection Solution part it's automation, it gives you visibility into the automation that's going into your website, but what is automation at large and what are bots? Well, there's a lot of categories of bots. There's the good bots that maybe a lot of people are familiar with, such as Googlebot, Yahoo! Slurp, and Bingbot, things that will improve SEO and that's important for driving business to your website. There's also neutral types of bots that are useful for things such as health checking, such as Pingdom, and New Relic, and these types of tools that are useful internally to understand the uptime and responsive times of your website.

Brooks Cunningham: (04:45)
And then there's, primary focus of this conversation would about bad bots, which they do a wide spectrum of things. There are things such as ATO, which is account takeover, where people will try to brute force their way into a given login to try to extract out financial information or other private information. There's also web crawling where competitors will potentially use bots to scrape inventory and the pricing there and do a few different things, either try to do an inventory type of denial of service, or just scrape it for competitive types of purposes. There's a lot of different types of use cases that good bots, neutral bots, and bad bots do, but that's just going over it a little bit at large. Are there any questions about just what are bots at large before I get into the specifics of how ABB solves for some of the nefarious bot problems? All right, well, I'll get to share my screen and we'll get to it.

Brooks Cunningham: (05:44)
Can everybody see my screen?

Chris Detzel: (05:46)
I sure can, yes.

Speaker 4: (05:47)
Yes.

Brooks Cunningham: (05:49)
So to get to this dashboard, and this is the advanced bot protection dashboard, and so to get to it, after you log into the My Portal, there should be a link in the bottom left, and then you can click that and then click launch advanced bot protection. If you don't have access to this link, please do reach out to your account representative for information on how you can get that link enabled. Once you click the link, this management console will then be available. And I'd like to talk a little bit about the reporting first. So we do have a general traffic overview, as far as on a global scale for the traffic that we have visibility into within the advanced bot protection solution, and then there's regional specific data. So in the traffic overview, it really is based on a site collection, as far as good bots, things that we've taken an action on, things that are white listed, and then items that are a bit suspicious. And we can certainly dive into what is good bots and then what are things that we'd like to take an action on. So the traffic that's being generated for this website, cloudwaf.badbotjail.com, is coming from Tor proxies and using a few different scripts that I have to generate some of this traffic.

Brooks Cunningham: (07:04)
And so the way that this traffic is overall generated and represented is based on a combination of flags, tags, another information that I'll show you in just a moment. Some of the more interesting information... The way that this is broken down, it's also, it's worthwhile to mention that this is in 15 minute increments. And the way that we represent some of this data is based on tags and flags. And so for example, we'll have this one here where we can see that there's custom tags over time. So for example, if there was a custom tag that you've applied, you can see that there's search engine traffic that's applied.

Brooks Cunningham: (07:42)
One of the more interesting things that I find is some of our machine learning models, we're constantly doing development, especially on our product development and engineering side, to develop additional models and fine tune some of the existing models. There are models that are targeted at specific use cases, and we're continuing to develop more models for specific use cases. We wanted to abstract out just a blank, or sorry, a larger bucket of machine learning and make it very, very narrow focused on specific use cases and behaviors. So that way you can use those specific models and gain insights based on what information is being tagged over time, and then take an action based on those patterns that you observe on your website. So I'd like to get into more of the regional specific information where we'll be able to dive in a little bit more into this information.

Chris Detzel: (08:35)
Hey, Brooks, you're doing so well that people have tons of questions already. So hopefully I can... A couple things, how do I by a good bot so I can buy the sneakers I want? So it appears that bots are used on popular product releases, are there good bots for buying?

Brooks Cunningham: (08:55)
Well, if you are the website owner of the sneaker provider, I would recommend to go into the internal sales. But if you are wanting to buy sneakers at a website that you are not, don't have an affiliation on, I don't know if I would consider that a good bot because you're using automation to potentially get ahead of other people that are trying to access inventory. Sneaker bots are actually a really interesting use case and we do, there's a couple of customers that I can think of off the top of my head, I won't mention on this call, but that do have an issue with sneaker bots because they're are a very novel type of products and there's a lot of demand for it. And bots will target those types of inventories. But I guess to that specific question, it depends on how your interacting with the website, if you're an official... In an official capacity, as a far as an employee of that company.

Chris Detzel: (10:02)
Thank you. There's another question, but I think you're going to answer this question, but if not, let me ask it anyway. So what is the difference between good and bad bots? You'll be answering that in a bit or no?

Brooks Cunningham: (10:14)
I will. So, but I think it's probably, that's a good concept to make sure that we address. So a good bot, it really is something that's beneficial to your business and the website.

Brooks Cunningham: (10:27)
Just full stop. We interpreted good bots in a blanket way, as far as Googlebot, Yahoo! Bot, and some of those SEO bots, as well as some of the social media bots. But at the end of the day, some of these bots are really up to what you determine is useful for your business. And so we can certainly say in a more blanket sense, this is what good bots are, but at the end of the day, it's up to you as an individual business to decide that this is good behavior or not good behavior.

Brooks Cunningham: (10:58)
And this tooling allows you to make some of those decisions and get data on that.

Chris Detzel: (11:06)
Great, yep. And then Taj says, "Good is likely desirable. So good crawlers like SEO tooling." Exactly, we'll talk a little bit about that in a bit. And then bad equals sneaker buying bots.

Brooks Cunningham: (11:22)
Absolutely correct. Yeah, those get in the news quite a bit. There's also, I think it was in the news, the Grinch bots where people would buy up all the popular toys before anybody else could buy them and then sell them on a secondary market. Where there is a delta between what customers are willing to pay and what a reseller market can pay for it, that's where we see a lot of bot activity. And so the ROI and the economics of bot operators is a fascinating subject in and of itself, but really where there is that delta, that's where you're going to see more bot activity.

Chris Detzel: (12:02)
Okay. All right. I do have another question, but let's continue on and then I'm going to ask that question later, Michael. Hopefully we can answer some of these. Love it.

Brooks Cunningham: (12:12)
Sure, yeah. Thank you all for the questions. So like I said, my tooling in particular, I'm not trying to necessarily hide the fact that I am using automation. I'm using a popular Python library, Python Requests, to distribute requests across various Tor proxies. And so I'm targeting a few different URLs. I can see that my requests are being flagged what's known violator data center flag. And so what that means is that when my requests are coming out of data centers, such as the popular ones, AWS, Azure, and we'll see some of the other, the usual suspects, so to speak, within here in just a moment. But when my requests come out of those data centers, that's going to be highly suspicious.

Brooks Cunningham: (12:58)
It doesn't necessarily mean that we should take an action outright on that, it's certainly going to be dependent on your specific policies, but it is highly suspicious. There are cases where customers will leverage VDI, virtual desktop infrastructure, where those types of request will originate from data center providers. However, for the majority of our customers, that's not necessarily a primary criteria, but we do give you some information on that, as far as the data center traffic that is coming at the website, and then if you would like to take an action on that, you may. We'll also provide, at least within this specific report, the URIs that are being targeted.

Brooks Cunningham: (13:42)
So one of the other reports that I want to talk to is this bad user agent. And there's a couple of reasons that this particular report is useful. One of them is fairly intuitive in the sense that if it actually is a legitimate user agent, or sorry, a bad user agent such as cURL Python request, typically those things are going to be bad. But what can be particularly interesting is the neutral types of bots, just because something's not executing JavaScript does not mean that it's not a useful tool for some of your internal monitoring systems. And so if you saw a path such as /healthcheck.html, well, that's probably not going to have a tremendous amount of value for a bot operator, but it may have a tremendous amount of value for your operations team internally. And so you can white list those types of paths so that the operations team can continue to monitor whatever it is they need to mantra at specific paths, or if you need to white list based on some other criteria, you could certainly do that as well. But here we can see my tooling, Python requests, it's absolutely bad, and there's a variety of paths that it's targeting.

Brooks Cunningham: (14:55)
There's a number of other reports, there's there won't be time in this specific session to go over every single one of them, but I would encourage you to explore them. The last one that I'll talk to is the behavioral machine learning models. And the intention of all of this reporting is to give you data, historical data, so that you can take an action with confidence based on this data and making true data driven decisions. So we give you this information, we try to be transparent with what we're tagging, what we're flagging for various violations. And then you can look to make an assumption about the future, about taking an action on specific types of behaviors, flags, or tags. And so we do provide some descriptions as far as what the model's trying to achieve. It's not in the source code, but it is a description as far as what the overall intention is. You may find some models are more applicable to your use case and targeting the automation that you want to target and other models may be less applicable.

Chris Detzel: (16:03)
Brooks?

Brooks Cunningham: (16:05)
Yep, go ahead.

Chris Detzel: (16:05)
Quickly, so can Imperva stop sneaker bots and make sure people can buy a fairly?

Brooks Cunningham: (16:15)
We stop all types of automation. There's tons of frameworks out there, Selenium, Phantom JS, and we can stop that type of web scraping. There are corner cases and it's a very much so a cat and mouse game where we try to build the defenses stronger and stronger, but we've certainly been extraordinarily successful with Cloud WAF Advanced Bot protection with stopping some of the most advanced threats that we've come across. When I'm talking about threats, it's not, there can be the assumption that it's one or two individuals that are building some of this tooling, but that's not necessarily the case. There's entire companies that their entire business [inaudible 00:16:58] built [inaudible 00:16:59] scraping, and collecting information or doing nefarious activity. With what we're doing and what we're achieving with this product, isn't stopping the individual types of traffic, we're really stopping huge swaths of automation. And we do have customers that do battle sneaker bots, and we have a really good relationship with them, and I suspect that they wouldn't be a customer of ours if we weren't meeting their use case.

Chris Detzel: (17:31)
Thanks, Brooks.

Brooks Cunningham: (17:32)
Thank you. Are there any other questions?

Chris Detzel: (17:35)
No other questions at the moment.

Brooks Cunningham: (17:37)
All right. So if I go to settings here, we'll go into how we can use some of this information to take an action. So in some of the previous dashboards, we saw that it was known violator data centers, as well as bad user agents. And so before I get into the specific settings, I'll click into my website groups. I'll talk briefly about what this is. So a website group is really a collection of similar applications. So in this case, I have cloudwaf.badbotjail.com.

Brooks Cunningham: (18:07)
But if I had a lot of similar applications such as shop.food.bar, shop.food.com, shop.food.co.uk, that used a similar URI structure or our path structure, it had similar backing applications, I would want to consolidate those under a single website grouping, so that way I could manage all the settings for that application in one place. If you wanted to break it out into individual applications or individual website groups, you could, however that's going to make the management of the overall solution a bit more tedious. So we would generally recommend to consolidate that to make management a bit easier, and the way that you would apply policies or take an action on automation, it's based on some of these policies, there's a default policy, and then there are per path policies. I just want to give an overview of the per path policy to give some context to the default policy first.

Brooks Cunningham: (19:06)
So if I click into the per path policy assignment, I can see that there's a number of them that I've created as well as the few that are default. The static content here is by default, as well as the default path. The order of operations is top down, so whichever path matches first is where it'll mesh and then apply the rule sets within the policy. The reason that there's a static content path policy here, in that there's a path here and policy bound to it is because it's typically not a use case for automation to target static content. And static content is things like typically [inaudible 00:19:46], .js, .jpeg, .png, static file extensions. So we would take an action on that type of traffic by default. If it is a use case where you want to protect those static type of assets, we certainly can, but just by default for the majority of our customers, it's more of the dynamic content that we protect.

Brooks Cunningham: (20:12)
I've created a couple of other policies here, like always block and always CAPTCHA so that it makes it easier for me to see the block page and the CAPTCHA page. I'll go into what the policies look like in a bit more in just a moment. And so if I go back to my Brooks cloud WAF, go to my default policy. I can see here that here is the policy. And then within here, there's individual directives such as allow block CAPTCHA, CAPTCHA cleared, identify a number of others, monitor, and the way that these function in a similar way as the per path is from the top down. So the top most rules with the highest priority will be the social media and search engine by default. So what these are, is that a white list, effectively things like Googlebot, Yahoo! Slurp, and Bingbot. Certainly if you wanted to block this type of traffic, you could take an action on that by moving the search engine to block, you could also do things such as updating your robots.txt so that you tell these SEO tools to honor your robots.txt file.

Brooks Cunningham: (21:37)
But the overall order procedure goes from policy, then you have your directives, and then there's the conditions within these directives. And so to focus on the automation part, I know that my particular tooling, it's coming from known violated data centers, and it's not sending a very good user agent. So within this specific item, and I can see within here that there's a handful of requests per second, this is the rule that's being triggered because I'm using Python requests as a user agent. I'm not trying to hide that I'm a bad guy with my specific tooling. And these are some of the flags that could potentially be triggered. So suspicious user agent would definitely be one of them. I can also gain some insights as far as how often this specific condition is being triggered based on this, if you condition [inaudible 00:22:28].

Brooks Cunningham: (22:32)
And again, these are in about 15 minute increments as well, and gives you some useful information.

Chris Detzel: (22:38)
Hey, Brooks?

Chris Detzel: (22:41)
Is it mandatory for website groups that all sites have the same URL structure? And can I have different sites with the same bot protection policies?

Brooks Cunningham: (22:55)
The answer to the first question I would generally recommend yes. It's not a requirement, it's going to be based on how you've designed your application. If you have, for example, www.food.bar/shop/inventory, and then you had shop.food.bar and then it was just /search, you could potentially just incorporate that search part as part of the overall application and have a protection there. You have to be careful with the design of your application and the design of how you're configuring some of the policies. It's fairly flexible as far as how you configure the per path, it doesn't necessarily just have to be starts with path or ends with path, there's a bit of flexibility there. But just to have some awareness, as far as how the two systems work with respect to here's how your application works and here's how the content protection is acting, but you can do that. And then can you repeat the second part of the question?

Chris Detzel: (23:58)
Yep. Can I have different sites with the same bot protection policies?

Brooks Cunningham: (24:03)
Absolutely. And so this particular policy, my... Actually, let me go back here, [inaudible 00:24:10] go back to settings.

Brooks Cunningham: (24:15)
So I could configure many different websites here by just adding additional websites and it would grab some of that information within here, and it would have some other useful information. But you would add additional websites within here, apply that to the website grouping, and then that way the policies would apply to all the websites within that group.

Chris Detzel: (24:42)
Great, thank you.

Brooks Cunningham: (24:44)
Yep, thank you. And so if I go back here to edit default policy, I have this specific action here, bad user agent, and there's a couple of different options here for enabling settings, or enabling conditions, there's disable, passive, and active. Disable means that essentially the condition's not doing anything it's like it wasn't even there. Passive is useful for gathering intelligence, especially on new rules prior to implementing them. So for example, if I was creating my own conditions, which by the way, this is really freeform you can write whatever rules you want, and I'll go into that in a moment, but you can do whatever you would like within here. And so that's particularly useful in a passive mode to understand the impact of specific rule sets before taking an action on specific traffic. And then in active mode, since it's in the block directive, if I put this to active and then I click the publish button up here, that's going to start blocking traffic.

Brooks Cunningham: (25:46)
So I'll give you a demo of how fast this actually is, because it's pretty amazing. There's been a tremendous amount of work within product and engineering to make sure that this is fairly efficient. So here I have my terminal, so a lot of my automation's just running in a Raspberry Pi in my living room. And so I'm targeting a specific path, it's just Ajax info going through a tour proxy. Ajax is great, this is what content is returned by my origin server. Now, if I go back into the settings, click this to active, publish configuration, then go back to my terminal, it's in that same request, now I'm being blocked.

Brooks Cunningham: (26:36)
And this is how a lot of bot operators are going to be interacting. I'm intentionally not showing this demo in the browser because a lot of the tooling that bot operators are going to work with is via terminal. I think that it's important to demonstrate how a bot operator would see that type of behavior. I'll certainly show how a CAPTCHA page or a block page will be represented, but this is, I think, an important demonstration for how fast the propagation time is, but then also what that behavior is. And if I need to revert back this behavior, because let's say it was a false positive, or some type of undesirable behavior, I can go back into the portal here, select passive, it's very important, make sure to click that publish configuration button, I'll go back to my terminal, and if I send the same request, I would expect that I have that same result as previous, and I do. And so the propagation time is very quick, and so you're able to get some of that feedback quickly or revert back settings quickly. It's extremely powerful within that respect.

Brooks Cunningham: (27:52)
Are there any questions? I know that terminals sometimes can be a bit overwhelming for folks, so if there are questions about what this test in particular is doing or demonstrating, please do ask.

Chris Detzel: (28:06)
Nothing in the chat as of yet.

Brooks Cunningham: (28:07)
Okay.

Brooks Cunningham: (28:13)
And so if you wanted to build your own custom conditions, you certainly have the ability to do that. So for an example, I do have a couple of custom conditions. So for example, if I wanted to always show the CAPTCHA page or always show the block page based on some criteria, I can definitely do that. So I've created a per path policies, per path settings, along with the always block or always CAPTCHA policies tied to them. And so if I go to this URI, I can see here's the CAPTCHA page, and these are customizable as well based on the cloud WAF settings. So within the cloud WAF documentation on customizing the error page, you can customize these pages as well for that documentation.

Brooks Cunningham: (29:02)
And then here's the block page. And so there is some information along with the IP address, so if someone does say that they were blocked or they received a CAPTCHA, especially the block, they can provide the information to you so that you could investigate and I'll show you what that would look like shortly. So if I go back to the management console, what does this look like? This always block. If I click into... Actually I'll do always CAPTCHA, if I click into here, I can see the only policy. Within this policy, I just have the one condition within the CAPTCHA directive. And so if I click in here and I click edit, the only thing that's here, and this is customer facing, you can write the code that you want for the documentation that's available, but it's just true. So always return to that CAPTCHA, there's only a handful of criteria when you would actually want this to happen, but for displaying the CAPTCHA page or the block page, I think that this one needs part of that criteria. But the language reference is within here, along with some examples, there's been a tremendous amount of effort within our product engineering team to make sure that there's the necessary knowledge available for you to be successful with writing your own custom rules.

Brooks Cunningham: (30:22)
It used to be with the legacy Distil system that this was under the hood. We definitely received a lot of the feedback that we want to expose this information to customers. So they have the ability to have visibility into the rule sets that are available on the platform, but also the ability to write their own custom logic if they would like.

Brooks Cunningham: (30:47)
So if I go back here to settings, one of the last items that I'd like to discuss is how to investigate... Sorry, was there a question?

Chris Detzel: (30:59)
No question.

Brooks Cunningham: (31:01)
All right. So now what do you do if you need to investigate some type of behavior? We do have an investigation dashboard that is available as well, so just make sure to select the investigation dashboard. And so if I want to look up specific behavior, I can certainly do that by filtering, the filter option is available within all of the reports, so you can filter on specific site or IP address. And so if I just filter on just in the past day, my cloud WAF site, any IP address, and you could also put in the request ID here as well, and click to run this report and that can pull quite a bit of data. So it may take a moment.

Brooks Cunningham: (31:48)
It will provide some of the actions over time, but then there's also some information as far as the specific policy that was invoked and then the deciding condition tag. So for example, if there's an always true condition, it'll tell me that, it'll tell me the specific URI that was sent. There's quite a bit of information within, and detail, within the logs here. And we're always making feedback, sorry, making adjustments based on the feedback that you provide. But this really concludes the demo and the information that I wanted to demonstrate for today, but certainly if there's additional questions, I'm happy to show [inaudible 00:32:29] from other information and go from there. I would encourage people to read some of the documentation afterwards as well on docs.imperva.com.

Chris Detzel: (32:41)
Okay, great. And what I'll do is when I do the followup, I'll send some of this documentation as well. Some people have asked specifically around, will it be other webinars like this more specifically around like DAM, DBF, DRA, etc.  We are looking at doing more webinars around the on-prem products. So I've been talking to some of the PMs there and they're very interested.

Question: I do have a question for you, Brooks, is there any dependency on SIM packages that you must use?

Brooks Cunningham: (33:18)
The SIM packages haven't changed, so far as I'm aware, I actually asked this question pretty recently. Some of the information, as far as if there was a specific action taken would be available within the cell, but there is a difference between the logging information that you'll see within the dynamic reporting engine and the information that's available within the normal cloud WAF logs. It's certainly some feedback that we can take back internally to see based on what type of information is useful based on the ABP solution. And we're looking forward to the feedback for internal engineering prioritization.

Chris Detzel: (33:55)
Great. A couple of people said, "Great job, really enjoyed the demo." The other thing I would ask, I put this in the notes before, if you haven't had a chance to go to https://community.imperva.com/home, there is a survey it's just one answer, you click it and then submit, and it's done asking about what types of webinars do you want to listen to specifically, not the topics we can get down and dirty on the topics later, but I just want to make sure that this is valuable to you. That is the goal at the end of the day. Hopefully these are helpful. 

Chris Detzel: (34:48)
So more to come on that, you will be invited, I'm trying to nail down that exact topic. So somebody did say they liked the trivia at the end. So we did a trivia thing one day with some of our customers, and so that was a lot of fun. So maybe I'll implement something like that, Michael, because this was only 30 minutes. So that could be a fun thing. So maybe our next webinar for sure. All right, thank you everyone. Really appreciate it. I will have this information posted on the community very soon. So look out for that. I'll email every one of you as well. So thank you so much and look forward to seeing you in a few weeks. Thanks, Brooks.

Brooks Cunningham: (35:33)
Thank you.
#AdvancedBotProtection
#CloudWAF(formerlyIncapsula)
#video
#Webinar
0 comments
640 views

Permalink