Mathematicians and marketing people alike would have us believe that Machine Learning is soon to replace all our human undertakings. While the automation opportunities afforded by Machine Learning will surely replace some human jobs, I’m here to explain why Threat Hunting won’t be one of them.
What is Threat Hunting? Threat hunting is the human act (art?) of searching a network, investigating indicators of compromise, and responding to malicious threats like malware, ransomware, or even active human adversaries. Threat hunting used to be the last phase of a company’s cybersecurity prevention strategy and is now becoming readily available to companies with emerging prevention postures through Managed Detection and Response Services. Threat hunters tend to use technologies that involve elements of Machine Learning.
What is machine learning? At a very high level, machine learning is a system that leverages algorithms to correlate data with the existence of given outcomes, based on exemplars input, such that that system can make a predictive attribution of those outcomes based on new data. My goal is to compare Machine Learning techniques as applied in cybersecurity relative to the (human) threat hunters’ techniques. What you will see is that despite the advancements that have been made in Machine Learning, it can’t adequately replicate what a qualified threat hunter does every day.
I’m not knocking Machine Learning – in fact, it enables visibility into more relevant data than it would take a human aggregator far longer (impossibly so) to ‘sift through.’ Treat Hunters need and actively use the outputs of Machine Learning to do their job; Machine Learning-driven tools alert, while Threat Hunters investigate, remediate, and respond.
Let’s compare them in context - here are five reasons:
1) False Positives & False Negatives
False Positives in malware detection are benign processes that ‘look like’ malware. They are annoying in that they can desensitize people to legitimate alerts and are a time-sink for investigators. False Negatives (malware that is not identified) are a risk and terrifying, as they have gone undetected by the systems in place to protect us. So, what happens with those false positives or false negatives when a machine is in control, and when a Threat Hunter is in control?
This isn’t an entirely ‘fair’ comparison as, presently, it’s typically the output of a system based on Machine Learning that actually produces the false positive/negative, while a Threat Hunter is usually reacting to those. Let’s look at it in context: a system encounters a process that resembles a virus (say a macro-enabled excel file, that a CEO is working on), treats it just how we would want it to treat a virus, and deletes the file. A threat hunter, on the other hand, can recognize that this is a false positive and move on, without causing an impact to the business.
2) Clustering Only Yields Correlation
Clustering is a machine learning technique. Clustering can tell us about correlation, but not about other outcomes, intents behind behaviour, or unintended consequences. With Machine Learning we can see that data points (log entries for example) resemble one another or resemble a given piece of malware – it can’t tell us what the expected outcomes of allowing the file to run will be.
Threat Hunters, on the other hand, can anticipate and subsequently check what a process is trying to achieve (changes to the registry for example) and react accordingly.
3) Advantages Can Be a Double-edged Sword
Even with all the benefits that machine learning offers, there is no way to control how they are applied. As easily as we can use prevention technologies driven by Machine Learning, hackers can use them to test their latest cyber-weapons and to understand whether they will be detected by frontline defenses like Anti-Virus. At those times (zero-day attacks for example) where the security afforded by machine learning has been overcome, but before it has been updated, a human threat hunter is the only viable defense.
4) Unknown Performance “In the Wild”
Even with vast quantities of highly representative data, we can’t know how a machine learning driven technology is going to perform in a real situation. There is a fine balance between under-generalization, where the model developed by the neural net is too specific to apply to many threats out there, and over-generalization, where the model is producing too many false positives to distinguish effectively between malware, and not-malware. This unpredictability is another reason that Machine Learning can’t yet replace Threat Hunters, who can adapt or ask for help if they need to. This becomes even more crucial, given my next point:
5) Must Adapt as Quickly as Malware to be Effective
The tactics that the bad guys are using change all the time. Algorithms that are powered by machine learning need to be updated with new attributes to keep them effective. For example, all the “old” Machine Learning datasets probably won’t include an attribute for “Foreshadow-afflicted Intel Processor” which could be an important factor in identifying new malware that exploits the new vulnerability.
In conclusion, Machine Learning offers us powerful tools to help improve our cybersecurity posture, influences the development of many existing security technologies and helps to make the most of the data we have collected on malware, incidents, processes, and other indicators of compromise. However, the tools are subject to false-positives, need to change rapidly to be effective, and can be used affordably by our adversaries, meaning that algorithms will always be susceptible to workarounds. Threat Hunters, on the other hand, think about the impact/outcomes of a given process to prevent the impact of false positives / false negatives, can see the reasons behind the correlation, and are able to respond to net-new threats like zero-day attacks.