Episode Summary: There is, in fact, a dark side to AI. Although we’re certainly not at the point where we need to fear terminators, it’s certainly been leveraged toward malicious aims in a business context. In data security, tremendous venture dollars are going into preventing fraud and theft, but this same brand of technology is also being use by the “bad guys” to try and steal that information and break into machine learning-protected systems. In this episode, I speak with Justin Fier, director of cyber intelligence at Darktrace, who speaks about the malicious uses of AI and how companies like Darktrace have been forced to fight these “AI assailants.” Fier provides valuable insights into the role of unsupervised learning, an addition to the full list of AI for data security applications that we’ve covered in the past.
Expertise: Cyber intelligence
Brief Recognition: Justin Fier joined Darktrace in 2015 as a Director for Cyber Intelligence and Analytics. As a contractor for over 10 years, Fier has supported various elements in the US intelligence community with a focus on cyber operations and counter terrorism. Prior to Darktrace, he was a senior systems security analyst with Abraxas Corporation, and a systems integration expert with Lockheed Martin. Fier earned a bachelor’s in economics from University of Maryland College Park.
Current Affiliations: Director for Cyber Intelligence and Analysis
1 – In the next year, the cybersecurity space will likely see a shift to defending against mostly autonomous machine learning systems.
The hacker market space moves with the technology; ML is the craze and Fier thinks it’s likely that malicious actors will incorporate more of this activity into their work in the very near future.
2 – Unsupervised learning is the newest threat to, and weapon against, Cybersecurity.
The truest form of AI in Fier’s opinion, is a fully unsupervised machine that is not getting help or training data, and that learns and makes a decision autonomously. Today’s cybersecurity systems need unsupervised learning systems to find patterns in unthinkable amounts of anomalous data.
The following is a condensed version of the full audio interview, which is available in the above links on Emerj’s SoundCloud and iTunes stations.
(2:03) I wonder if can you give us a tour of what breeds of malicious AI are that folks like your are out there fighting?
Justin Fier: Sure; the first thing I like to start with is that AI is not truly here yet; we hear it mixed with the term ML quite a bit. ML is a subset of AI…and I think that’s where we’re starting to see it… we’re going to continue to see hackers spearfishing campaigns with machine learning. We just recently saw a campaign where, after gaining access to a user’s email, the program looked for keywords; file names; language; the way the person types and speaks; and crafted e-mails based around that, because it gave them more of a chance of getting clicks on the other ends, and I think that kind of sets a framework of where we’re going to see ML move in the malicious hemisphere—using the machine to determine patterns of life, really intricate ways to move around internally on the network while going undetected.
(6:59) Maybe similar systems could be used in the malicious context for…scoring how in line with the voice and the language of the person whose email we’ve hacked we are. Is this the sort of thing that’s kind of in the works on some low level or to be expected in the future as people continue to try and engineer these malicious systems?
JF: Yes absolutely, and everything your describing is not new technology; the marketing and advertising space has been using that technology for the last couple years now, for monitoring sentiment among people; I think it’s just better, we’ve gotten to the point where malware writers have figured out how to use it…
…I think there’s a thought that ML has to be done by big data centers and massive computers, which is not the case; it can be run as a silent process in the background of your computer. At the end of the day, what ML is just looking at is data, it’s kind of the replacement for the big data buzzword that’s been around for the last couple of years, it’s really just an efficient way of culling through data and making a decision off of that…
(10:43) Can you give us an example of what a network looks like and is made of, what (malicious) activity is like, and what a malicious actor might want to steal in these circumstances?
JF: So I would first say, regardless of the size of the networks—at the end of the day, all your network is is a massive data set; where it gets really complex is your data set is constantly changing…that’s really where Darktrace, for example, got the idea of building what we call the “enterprise immune system”. We wanted to approach security from the immune system approach because we realize every network is different, it acts as a living organism. So the challenge with a lot of networks is just monitoring all those different pieces…and the question is, how do I monitor al of them (end points), how do I decide if they’re acting in a normal way…
…a malicious actor is going to try and exploit something to do something on his behalf, and most of the time it’s going to stand out amongst other devices on the network, and it’s finding those subtleties…it’s gotten much harder to find those anomalies, it truly is a needle in a haystack…
(17:51) What you’re saying is some people define endpoints in the traditional way, you guys and maybe the industry in the future will be considering anything with an IP address — these are obviously places where entrance occurs, where a breach can occur?
JF: Absolutely, and hackers ares some of the most creative people in the world; if they can’t get in through the desktop anymore, they’re going to find another way to get in; and as we’ve seen in the last couple of months, IoT for instance is a wild, wild west of network devices, many with little to no security built in. For the most part it’s been isolated to mostly residential devices, but I think it’s only a matter of time before that moves into the enterprise…that’s really also why Darktrace took the approach that we want to look at everything. When we look at a network, we look at everything with an IP address…and that’s absolutely key, just knowing where the data is moving around laterally and vertically.
(20:36) Could you walk us through what a real breach looks like and what’s actually happening when this malicious AI attacks?
JF: In a particular case…we were in a network and we saw a machine acting anomalously compared to a number of different factors; we look at everything from a macro and micro level…and so from that, we were alerted to some odd activity…everything from how rare is the destination that the data is going to in comparison to the machine itself, the entire network, other devices around it, what step of protocol is being used, does it normally use that protocol, even down to the size of packets…we could look at hundreds of different metrics, and that’s where the unsupervised piece comes in, applying sophisticated mathematical algorithms to decide how anomalies are based on all these factors….
…what we discovered in this particular network is this system had been compromised, and instead of contacting command control…it just sat there and it watched and it learned and it tried to blend in…despite the fact that it was using a kind of version of ML, we were still able to find it because it truly was that needle in a haystack that a human eye just wouldn’t have found.
(24:00) Talk to me about this circumstance broadly..what could have been stolen and why?
JF: In this particular case, I believe it was a manufacturing company, so I would suspect their intellectual property, the manufacturing process, materials, etc. but every industry is different. We even saw a large restaurant chain using a rather insecure protocol to monitor all their refrigeration units across the many different franchises, and even though we didn’t see truly anomalous behavior, we brought to their attention how damaging that could be if someone was able to manipulate that data…
(25:52)What happens with that kind of proprietary data?
JF:…I don’t know that you would sell it the competitor; I think the competitor probably hired you…it’s not difficult to get out on the dark web and find these people wanting their services, it’s gotten to the point now— we’ve seen with the Ransomware model—you don’t even need to be technical; you can buy a Ransomware package that comes with a full user interface point and click and deploy it…I can be sitting in Starbucks managing a webpage and just watching my payment, which is usually in the form of bitcoin, go through the roof…