Threat modeling is vital to any security practice. For example, we can understand the OWASP top 10 vulnerabilities, but we can easily misspend our efforts and dollars defending against exploits of vulnerabilities not present in our web applications. Managing the epidemic of bot traffic can be just as daunting, when considering how such traffic affects our threat models.
The term “bot” or “botnet” is thrown around a lot. Why is bot detection so valuable to your security posture, if the typical botnet isn’t a significant part of your threat model? Bots cover large classes of fully-automated scans and attacks, and are not just limited to the classic notion of a botnet. In fact, many automated scans and attacks don’t need the scale of a botnet because applications are vulnerable to much smaller, more surgical attacks.
First, let’s be clear on what we mean by bot traffic. The term “bot” immediately conjures images of botnet hordes of enslaved, malware-infected desktops and servers, used to execute massively distributed denial-of-service (DDoS) attacks. However, bots can come in many different shapes and sizes, and share in common only that their actions are automated. There are friendly bots, such as those deployed for search-indexing. There are also various command line tools employed for a variety of purposes, from harvesting data to scanning for possible vulnerabilities. For practitioners supporting retail websites, the former category of data-harvesters is a particularly insidious threat also known as aggregators or web scrapers, mining sites for publicly available data such as fares or product prices for competitive purposes. The latter category of automated vulnerability scans might be your penetration tester, or it could be a pre-cursor to more directed attacks.
The command-line tools popular among pen-testers and attackers alike are relatively easily detected, as they share none of the characteristics of an actual browser. Through application enhancements via smarter code or in-line security technologies such as WAF or IPS solutions, detecting and filtering out such illegitimate requests has many tried and true methods. As one might imagine, our adversaries have other tricks in their bags, but will still use the easiest, least resource-intensive tools available on the chance they’ll work, or reveal a vulnerability worth probing manually.
Web scrapers and attackers alike will leverage browser-based tools to circumvent easy detection, contributing to a lot of noise in our logs and sensors that isn’t easily dismissed as non-browser traffic. There are known browser extensions that enable automated data-mining and scraping, as well as debugging tools like TamperData that enable the creation of modified requests from a seemingly legitimate browser. PhantomJS is an example of a so-called headless browser, which has no GUI, but is fully featured with the capabilities of a browser to interact with a website. With an easy-to-use API, it’s possible to script the behavior of a headless browser and easily crawl websites with speed for whatever purpose you might intend, without the performance overhead of a GUI.
Detecting these browser-based or browser-like bots can be much more difficult, and requires more aggressive interactions with browser to detect things like keyboard and mouse movement, as well as behavioral tracking of traits like rapid surfing/page-loading. Additionally, we can leverage client fingerprinting to not only track these automated browsers, but potentially differentiate them from legitimate browsers.
The security value of detecting these bots arises in various forms, depending upon your threat model :
- Reducing traffic load associated with bots, which typically comprises 40-70% of site traffic.
- Eliminating the noise in logs, and enabling focus on more directed, motivated attacks which are manual.
- Concealing vulnerabilities from automated reconnaissance, thereby reducing the risk of subsequent attacks.
- Increase the difficulty of mapping and cataloging the web application infrastructure.
- Preventing application-layer DoS attacks, aimed at attacking CPU-intensive application functions.
Aiming detection and mitigation resources at the client enables us to better understand our adversaries, and leverage methods and technologies to reduce the threat surface. While the most dangerous attacks are often not automated, attackers frequently leverage the noise of a DoS attack or other automated scan to make detection more difficult.
How do automated scans and attacks fit into your threat model?