The intersection of cyber security and big data is an area where I spend a lot of my time. In a lot of ways, I see cyber security as ideally suited for Big Data Analytics. But surprisingly, cyber security organizations have been slow to adopt Hadoop and other big data technologies.
That’s unfortunate, seeing as big data tools are inherently good at solving the kinds of problems traditional cyber security infrastructures struggle with ‒ augmenting and diversifying the capabilities those traditional tools offer.
Let’s take the high-profile Target data breach as an example. I reached out to my friend Wayne Wheeles (a cyber security practitioner from Release2Innovation), who wrote a fantastic retrospective on how it occurred and why. Based on initial reporting, Target hired an outside contractor who installed and monitored network-enabled smart devices (heating and air conditioning systems in this case) on the same network that also ran their point of sale system, so that the devices could call home if something was broken.
The problem was they never considered that this equipment could be used as an attack vector for the other systems in their store that shared the same IT infrastructure. I mean, who worries about the security of their AC unit? This allowed the thieves to be very “noisy” ‒ meaning that the evidence of their wrongdoing was readily available for anyone who knew how to look for it ‒ but still not get caught before getting away with the personal information of millions of shoppers. Worse yet, while you’ve heard of this happening at Target, this vulnerability exists at a lot of different retailers with similar infrastructure, because stores weren’t built out with multiple networks to segregate smart devices from critical systems.
Traditional cyber-security systems are designed for known threats and to be very tactical and reactive, struggling to identify new or unexpected types of attacks. This is complicated by the information silos – web server logs, netflow data, IDS signatures, etc. are all stored in different places, making it difficult to take a holistic view of your cyber security data to discover complicated, unexpected attacks. Taking a big data analytics approach to cyber security flips that model on it’s head. The silos are gone, and now we’re looking at the data as a whole, and not as individual pieces of network activity. Using Big Data analytics, we can determine complex patterns of behavior without being concerned with the noise that generally accompanies these types of data sources.
Think of it this way – a network-connected AC system would normally have a very limited communication pattern. You would expect a regular heartbeat message saying everything was okay, and occasionally a message indicating some kind of problem (the compressor is broken, the filter needs to be changed, et cetera). Any behavior outside of that limited pattern of behavior is immediately suspect, but it doesn’t require any kind of complex algorithm to figure it out. You simply need to compare the historical network data with your expected behavior, and bubble up anything unusual. If the AC system, or any other seemingly innocuous system, is suddenly having long data-swapping conversations with other systems, that bears looking into.
We’d do this kind of comparison using Platfora and Hadoop, and a handful of different data sources. We would start with a list of IP addresses that correspond to our particular class of network devices (infrastructure monitoring sensors), and a whitelist of the servers we expect these devices to be talking with, and over which ports. We’d then compare those sets with our other holdings (netflow, log data, IDS, etc). Any records from our other holdings that don’t meet whitelisted criteria bubble up immediately for a network security analyst to evaluate. In the Target case, these devices were connecting to the point of sale server in the store – something that should never have happened. This would have been immediately evident if a network security analyst had seen the data, but the nature of network data (approximately 500 million netflows daily for a medium sized network) is that there’s just too much data to manage unless you’re using a big data platform to collect it, organize it and make sense of it all.
Of course, you can use big data tools to do far more than bubble up unusual behavior. Let’s say you want to correlate several data sources together. You want to combine the last 90 days of router data, a blacklist of known bad guys, data from an intrusion detection system, and maybe some data on network port access. With this information, your network security analyst can dig deep into what’s happening on your network, revealing events and relationships that would have otherwise gone unnoticed, and allowing them to take appropriate and timely action. Leveraging Target as an example – a network security analyst could have used these correlated data sources to discover the data extraction that was going on under their noses much sooner, because it would have shown up as an outlier in their visualizations. Hadoop and Platfora can do that very easily, allowing you to go deep to determine both where you’re most vulnerable and what other entities already know about your systems. That kind of power will enable you to protect your systems and information far more effectively than you could with traditional network security solutions alone.
Keith McClellan leads up Federal Engineering at Platfora, and has been focused on big data and related technologies for most of his career. If you’re interested in his random musings, he tweets @keithmcc and occasionally writes for the Platfora blog.