5min. read

Today, it’s impossible to deploy effective cybersecurity technology without relying heavily on machine learning. At the same time, it’s impossible to effectively deploy machine learning without a comprehensive, rich and complete approach to the underlying data.

Why has machine learning become so critical to cybersecurity?

Several reasons. With machine learning, cybersecurity systems can analyze patterns and learn from them to help prevent similar attacks and respond to changing behavior. It can help cybersecurity teams be more proactive in preventing threats and responding to active attacks in real time. It can reduce the amount of time spent on routine tasks and enable organizations to use their resources more strategically.

In short, machine learning can make cybersecurity simpler, more proactive, less expensive and far more effective. But it can only do those things if the underlying data that supports the machine learning provides the complete picture of the environment. As they say, garbage in, garbage out.

Why is focusing on data critical to the success of machine learning in cybersecurity?

Machine learning is about developing patterns and manipulating those patterns with algorithms. In order to develop patterns, you need a lot of rich data from everywhere because the data needs to represent as many potential outcomes from as many potential scenarios as possible.

It’s not just about the quantity of data; it’s also about the quality. The data must have complete, relevant and rich context collected from every potential source—whether that is at the endpoint, on the network or in the cloud. You also have to focus on cleaning the data so you can make sense of the data you capture so you can define outcomes.

Collecting, Organizing and Structuring Data

How can board members and senior level executives ensure that their organizations are effectively leveraging machine learning in their cybersecurity strategies?

We posed the question to Giora Engel, vice president of product management at Palo Alto Networks. He said it all starts with taking the right approach to data.

“It’s about how you collect, organize and structure the data,” Engel said. “What you collect has to contain information about everything that happened, not just the threats. It has to be rich enough to provide details about machines, applications, protocols, network sensors. It needs to correlate what happened between what you see on the network and what you see at the endpoint.

“Part of the work is stitching all of that data together, so you get one representation with the full picture,” Engel added. “Then you can build different models, model different aspects of the behavior and then use algorithms to make decisions about when to issue alerts, when to take action to respond to potential threats, and when to build in preemptory protections.”

Asking the Right Questions

For leaders on the business side, this means posing the right questions to their colleagues on the technology and cybersecurity sides. Engel says there are several key areas on which to focus:

  1. Do they have the right data to respond to an active attack? What kind of data are they collecting—do they have information on the network, on the endpoints, on the various clouds in which data and applications are deployed?
  2. Is the data structured in a way that can be used for decision-making and detection, or is it just sitting there? Can they effectively leverage data that comes from multiple sources?
  3. Are your teams confident that in using their data they can detect any attacks in the network? Are they using automation for detection as well as response?

One of the biggest challenges is getting data from the endpoint, network and cloud and normalizing it into one state, so that it can be used effectively for machine learning.

Even with modern, sophisticated machine learning technology you can’t make sense out of data that isn’t relevant or categorized for analysis if it is coming from multiple sources. The data needs to be in the same “language” so the algorithms and models can understand the data and effectively apply the machine learning capabilities.

“It’s not just about getting the right data,” Engel said. “You need tight integration between the data and the machine learning. You need an integrated approach between machine learning and data collection, organization and structuring.”

There is so much talk about machine learning and artificial intelligence that business leaders can be excused if they feel like they are on hyper alert. When it comes to cybersecurity, however, the potential for machine learning to have a dramatic and lasting impact is real. But only for companies that are forward-thinking enough to take care of their data first.


Al Perlman, co-founder of New Reality Media, is an award-winning technology journalist. For the past dozen years, he has focused on the intersection between business and technology, with an emphasis on digital transformation, cloud computing, cybersecurity and IT infrastructure.