The big data explosion has impacted a variety of industries over the past decade, requiring new technologies, skills, and teams to take advantage of this data to impact business goals - which we succinctly summarize as Business Intelligence (BI).
The rise of BI has lead many organizations to put a claim that they embrace “data-driven decision making”. Why is it then, that when we look at our security programs, they are rarely making data-driven decisions?
Early Adopters: Security Operations
The use of security data has been best adopted within Security Operations (SecOps) due to the large amounts of log data being centralized, normalized, and analyzed for detections. Most Security Information & Event Management (SIEM) tools also provide the capabilities to build dashboards based on this data for leadership & analysts to use.
While these capabilities have spawned the use of data visualization & analytics to understand large amounts of security data, there are several challenges that lead me to believe the next evolution will not be seen in SIEM platforms for two key reasons:
Cost-to-value ratio
Platform use case
Cost-to-Value Ratio
“Optimizing SIEM ingestion is not just about reducing costs, it's about aligning the price of data with its value.”
It’s no secret that SIEM tools are expensive, and this is mainly due to their data ingestion costs. The current trend in security operations is to ingest only the data needed in an effort to optimize costs; in other words, SIEM tools have become a game of pick & choose.
Paired with the explosion of security data generated by a modern environment, the SIEM becomes a hard place to win the battle over ingesting more data. But why would we need to ingest more data?
There are still a variety of data sources and data types that are not typically ingested by SIEM tools that provide value in decision-making such as:
Vulnerability management
Security audit & assessment results
Penetration testing
Security awareness training
Policy, procedures, & other governance data
Making the case to ingest additional data into a tool with uneconomical ingestion and storage costs for the sake of enabling more analytics use cases seems hard pressed - especially when the current conversations are “how we can ingest less?” instead of “what else can we ingest?”
In short: The cost-to-value ratio of ingesting data sources into a SIEM that do not contribute to detection is insufficient.
Platform Use Case
“If the only tool you have is a hammer, it is tempting to treat everything as if it were a nail” ~ Abraham Maslow
SIEMs were built for a purpose: the detection & investigation of security incidents. While analytics and visualization is rather streamlined within that context, its when you stray outwards to GRC, Security Testing, Security Awareness Training, and other contexts that the viability of the SIEM as the analytic tool of choice quickly fades.
Data visualization, for example, often lives within the tool for use by SOC managers and analysts. While a portion of the access constraint is alleviated with the introduction of cloud-hosted SIEMs, it’s far-fetched to expect all relevant stakeholders to access the SIEM in order to view and consume security analytics.
Because SIEMs are primarily security tools, there is also fewer members of the analytics community familiar with the platform compared to other industry-standard tools such as R, Python, Tableau, or PowerBI. In fact, many organizations may already have the talent in-house that are familiar with these tools.
The Next Wave: Unhooking Analytics from Detection
“Those working in the security industry often assume that the problems and trends seen in security are unique…” ~ Ross Haleliuk, Cyber for Builders
There is a phenomena in information security that we often think our problems are unique to the field, instead opting to reinvent the wheel instead of look for solutions in adjacent disciplines.
The next wave of analytics for information security programs looking to expand beyond the perimeters of security operations likely won’t be in the confines of traditional security tools — SIEM or otherwise — as they’ll quickly discover the two roadblocks for adoption:
Cost-to-value ratio
Vertical use cases
Instead, organizations will borrow from the Business Intelligence (BI) playbook and look to build data pipelines connecting security data and centralizing it in lower-cost data lakes. In fact, we are already starting to see this transition with organizations retaining data not actively used in SIEM detection but still of value in other storage methods.
Leveraging Previously Untapped Data
“Information is the oil of the 21st century, and analytics is the combustion engine.” ~ Peter Sondergaard
By lowering the cost-to-value for ingestion & unhooking analytics from detection, security teams can begin to harness new data sources that were previously neglected.
You can think of this phenomena in the context of oil extraction. In the oil industry, certain deposits were considered too difficult & costly to reasonably extract. Instead, they went unused until new technologies — such as horizontal drilling and hydraulic fracturing — shifted the cost-to-value ratio.
Just as these new technologies enabled the oil industry to leverage previously underutilized oil deposits, using different technologies for security analytics turns previously unviable data sources into positive return-on-investment (ROI).
Governance, Risk, & Compliance (GRC) data, for example, is of significant value: laying the foundation of a security program’s goals, objectives, and controls. Despite this value, many organizations leave this data untapped in their security analytics programs, instead living in spreadsheets, documents, and presentations.
To tackle this, we’ve seen an emerging “GRC Tools” market that aims to operationalize GRC activities through software - however, most fall flat in their ability to act as analytics tools across an organization’s security data estate, or even just their GRC data at that.
In addition, most security data currently being analyzed is structured in nature. However, there is a large variety of unstructured data generated, such as but not limited to security testing reports, security questionnaires, contracts, security advisories, and threat intelligence. Implementing dedicated analytic pipelines opens the door for this data to generate insights with the use of natural language models that can analyze unstructured data at scale.
Breaking Down Data Siloes
It’s not enough to just tap into new data sources - after all, many of these sources have their own siloed analytics capabilities housed in that domain’s tools, such as vulnerability management data.
We’ve often heard of ambitions to break down siloes in organizations and improving cross-department collaboration and communication. While this has been an on-going objective in organizations across the board — the concept has merit. Centralized security analytics democratizes it across the organization and unlocks novel insights from previously siloed data.
While the security operations team may be the main authors and consumers of security analytics in the old world, they are rarely the only team that analytics impact. Helpdesk, Identity & Access Management, HR, Infrastructure, and Development are only a few of the domains that not only contribute security data, but directly benefit from its analysis and insights.
Security data is often siloed: generated, analyzed, and consumed all within the confines of a discrete piece of the organization. For example, the development team may analyze the results of their application security testing, software composition analysis, and unit testing; however this data rarely makes it outside the walls of the development team - when it does, it takes the form of slide decks and other upwards reporting to management. Centralizing this data to be correlated across the entire security data estate instead unlocks new value and insights.
Enabling Distribution
Using traditional BI infrastructure also allows organizations to reduce friction in distributing analytics across the organization by leveraging the same tools stakeholders are already used to using to consume data.
With decentralized security analytics, data is analyzed and visualized within the individual security tools. For example, your phishing simulation tool may have the ability to report on the results of recent tests and generate some charts. Your vulnerability scanner is likely able to generate a report or Excel output that can be filtered. But what happens when an adjacent team needs to view these reports?
You create an inefficient web of access, where teams access security analytics in a variety of places, using a variety of tools.
In scenarios that granting direct access is not viable, such as executive reporting, we see employees cobbling together slide decks with manually input metrics and graphs.
Centralized security analytics, on the other hand, leverages the same infrastructure as the rest of the business, cleaning up the mess of information distribution.
Whether the business is using Microsoft Fabric, open source, or even homegrown analytics pipelines, security can be incorporated into the existing data culture and take advantage of any existing maturity in managing the lifecycle of BI resources.
It changes the game when security professionals take advantage of the headways already made by the BI community and contribute intelligence to the organization.
Conclusion
In conclusion, the landscape of data analytics in information security is undergoing a significant transformation. As we have explored, the limitations of traditional SIEM tools in terms of cost-to-value ratio and platform use case are prompting organizations to seek alternative solutions for their security analytics needs. The future lies in unhooking analytics from detection and embracing the principles of Business Intelligence (BI) to centralize, analyze, and distribute security data more effectively and economically.
By adopting lower-cost data lakes and leveraging technologies that can process both structured and unstructured data, security teams can tap into a wealth of previously underutilized data sources. This shift not only promises a better return on investment but also enables a more comprehensive understanding of the security landscape. Furthermore, breaking down data siloes and democratizing security analytics across the organization will foster better collaboration, enhance decision-making, and ultimately strengthen the organization's security posture.
As we move forward, it is clear that the integration of security analytics into the broader BI infrastructure will be a game-changer. It will allow security professionals to benefit from the advancements in BI, contribute valuable insights to the organization, and ensure that security data is not just collected, but harnessed to its full potential. The new world of data analytics in information security is not just about collecting more data—it's about making smarter, more strategic use of the data we have to protect our digital assets and infrastructure in an ever-evolving threat landscape.