Automated web traffic is a fundamental part of the Internet. The bots that generate this traffic come from a wide variety of sources, from Google’s harmless web crawling to malicious hackers targeting government voter registration pages.
In fact, bots drove almost 40% of all collected Internet traffic in 2018. That means that out of every ten Internet users, only six are actually human beings sitting behind a computer or peering into a smartphone.
The vast proliferation of bots is a concerning development for business leaders in almost every industry. From airlines to e-commerce, there is an ecosystem of bots carrying out a broad range of activities.
Not all of these activities are harmful. Many of them simply occupy network resources. But some of them aggressively scrape data for fraudulent purposes, and others are part of sophisticated criminal networks.
The problem with bad bots is that they cleverly mask their behavior to seem like human users. Sophisticated programming helps them act in ways that advanced firewall technology cannot directly address. A hypervigilant firewall solution could easily begin restricting legitimate users – which would be bad news.
The existence of good bots also complicates matters. Any marketer with experience in search engine optimization will tell you that blocking Google’s Googlebot will hurt search rankings.
This means that any advanced bot mitigation solution will need to be able to carefully distinguish between harmless bot traffic and malicious bot traffic. This is where Imperva, in partnership with Distil Networks, comes into the picture.
Bad bots are responsible for a wide variety of malicious business practices. Some of the most well-known include:
Unfortunately, there is no one-size-fits-all solution that can guarantee a total elimination of bad bot behavior. Since the most sophisticated bots act like human users, optimal mitigation demands a multi-faceted approach that goes beyond the traditional approach.
One of the primary ways that websites, applications, and APIs defend against automated users with malicious intent is through reverse proxies. Where a web proxy accesses web content on a user’s behalf, a reverse proxy accesses server resources after receiving a client’s request.
In a typical denial of service attack, thousands of bots overwhelm the victim’s servers with requests. The reverse proxy is an effective solution for reducing the attack surface, offering a layer of protection from unexpected traffic spikes.
This technology goes hand-in-hand with caching. If a website stores copies of its pages on the reverse proxy server, it is possible to configure to combine those requests into a single request to the end server. This way, the end server does not suffer the impact of the attack.
The combination of reverse proxies and content caching allows content delivery networks to mitigate many of the most dangerous bot attacks reliably. In Imperva’s case, the fact that mirror versions of website pages are stored in disparate geographical locations also helps improve content delivery speeds for legitimate users.
Since reverse proxies act as intermediaries between backend servers and anonymous traffic coming in from every corner of the Internet, they offer an ideal position for traffic scrubbing. This is the process of validating incoming requests before sending them onwards to the origin server.
Deploying a perimeter mesh of reverse proxy servers at this stage provides resilient defense against bot traffic. Combining this defense with Cloud web application firewall (WAF) technology allows the system to weed out malicious bots and cybercriminal requests that make their way in.
Even the most sophisticated bot mitigation solution must let some bots in. A significant percentage of bots perform valuable, desirable functions. Hackers know this, of course, and will not hesitate to impersonate a good bot if it improves their chances.
Bad bots posing as good ones always give themselves away somehow. There is only one legitimate Googlebot out there, and its behavior is hard to mimic convincingly.
It takes years of collecting and qualifying data on good bot behavior to accurately identify which bots to whitelist. Imperva and Distil Networks constantly refine this process, leading to better identification and fewer false positives every day.