Spear phishing attacks are difficult to detect automatically because they use targeted language that appears “normal” to both detection algorithms and users themselves. Today’s approaches to detecting such emails rely mainly on heuristics, which look for “risky” words in emails, like ‘payment,’ ‘urgent,’ or ‘wire’.
Unfortunately, such methods fail when adversaries make word choices or use sentence structures that heuristic designers did not anticipate. This is why many email security approaches now rely on machine learning, in addition to heuristics. Well-designed machine learning algorithms for detecting spear phishing have proven more resilient to variation in sentence structure and word choice as compared to heuristics.
Teaching machines to go deeper
Neither heuristics nor textbook machine learning approaches are silver bullets, and the criminal economy in spear phishing-based cyber-crime is booming as a result. There’s an urgent need to up our game as defenders. Fortunately, recent breakthroughs in artificial neural networks have provided an opportunity to do so.
Indeed, state-of-the-art neural networks trained on gigabytes of text can now learn deep structure within language in ways that are far more sophisticated than older machine learning approaches. There have been major advances in the last year and a half, including the OpenAI GPT-2 neural network that’s able, under certain conditions, to write coherent documents. Take a look at this viral example of this neural network writing an essay about unicorns, if you haven’t already seen it.
In the security research community, we’ve found that we can take neural networks like these and adapt them for phishing detection, achieving breakthrough results. To understand why, imagine a neural network that can analyze the underlying topics, sentiments, imperatives and tone incident to emails, using these observations to decide whether or not a given email is a phishing attempt.
While our early results are exciting, it will be an ongoing process to bring such new technology to market. There are no silver bullets, but teaching machines to better understand language using modern neural network technology will lead to significant advances in protecting organizations from phishing.