The term Big Data is often misused or used to represent many different concepts depending on who is delivering the message or what context that data is being used for. In its simplest definition, Big Data refers to data sets that are too large for traditional data processing techniques to manage or manipulate.

Within cybersecurity, the term is often used to represent a data set that can influence or provide improved security outcomes to the organizations. In many cases, Big Data may enable capabilities that were not possible or achievable with smaller data sets. However, what constitutes Big Data depends on the capabilities of each organization and its resources. For some organizations it may be hundreds of gigabytes and others it may be hundreds of terabytes that constitute their Big Data.

A key aspect of understanding the concept of Big Data within the cybersecurity world is how the data is collected, stored, updated, indexed, searched, shared, analyzed, and visualized. Therefore, to help define what Big Data really means to the cybersecurity world, we start by defining some of the use cases where larger data sets can enhance security of an organization.

Situational Awareness … provided by Big Data

One of the most urgent and operational requirements for security teams is building situational awareness, whether they are building an understanding of the intelligence on the threats and activities of threat actors or actively responding to active threats or incidents. The Internet is a vast universe of connected devices, applications, and users. Big Data is required to build an awareness of that universe at full scale. It must not only scale to the billions of nodes that represents entities in that universe but also represent the even greater number of connections between those nodes as well as all of the relevant security metadata associated with those nodes and connections.

Tip #1: Big Data must support representing a large graph of entities and their connectedness, as well as support metadata context within that graph

Historical Context … provided by Big Data

The cybersecurity world is a continuously changing environment. The Internet, the nodes, networks, applications, and users connecting to that world are in a continuous state of change.

To understand both active threats and potentially future threats, Big Data provides historical context and understanding of that changing world, which can inform security teams looking to build a model on how to respond more effectively. Therefore, Big Data must not model one view of the large Internet universe but it must also represent and handle the associated time and events that occur in that universe so that it can help answer questions and actions based on a timeframe of when events occurred, when data was discovered, how urgent specific data must be treated, and so on.

Tip #2: Time and historical contextualization of Big Data is a critical factor in making sense of Big Data in cybersecurity.


Structured and Unstructured Elements… of Big Data

It is not enough to collect and store all the data and its historical context. How that data is then used is just as important. For example, to efficiently search Big Data to determine whether an event or data fragment represents a larger threat to an organization requires that special attention is paid to making Big Data searchable especially when that data contains both structured and unstructured data sets. It does no good if a search takes 3 days when a cybersecurity event can happen in milliseconds. A decision to act must be taken within secs or minutes, not days. Big Data search is a non-trivial problem to solve as it depends on what you are searching and how you are searching. For example, if you want to search for an IP address then you may consider an approach to index the Big Data based on IP structure and networking topology. However, if you are going to search unstructured data looking for terms and words that, when combined in certain permutations represent a threat, then the search mechanism will likely be substantially different.

Tip #3: Structured and unstructured Big Data are both important. Leveraging both for cybersecurity use cases will define how those two characteristics define design choices.

Operational Use…of Big Data

Big Data is a multi-faceted technology problem. It’s much more than just a data problem; rather, it is a business technology challenge that requires the effective understanding of the data; its structure and scale; and how that data will be combined and effectively leveraged with systems and business processes to achieve business results that impact cybersecurity operations. The diagram below shows a very high-level process for where Big Data can be applied at various stages throughout the cybersecurity workflows supporting threat hunting, threat investigation, and threat mitigation.

Figure 1: High Level Big Data Use In Cybersecurity
Figure 1: High Level Big Data Use In Cybersecurity

Key Areas to Consider:

  • System Improvements
    • What data is required; what data gaps exist and how to improve those; what schemas exist; and how multiple data sources can be combined for effective use
  • Data Ingest
    • What translation processing must occur; how will data be cleaned/de-duplicated and indexed later on in the system
  • Data Storage
    • How data will be stored at large scale with robust security; best practices around redundancy and replication must be a factor
    • How data will be retained or continuously pruned when in production
    • How versioning of data will operate at scale
  • Data Enrichment and Processing
    • What enrichment will take place with machine analytics and the human element
    • What business operations will interact with the data store such as search operations, alerting, dashboarding, reporting …. etc.

Tip #4: Effective cybersecurity use of Big Data is a business operation challenge not just a data processing one

It is a foundational aspect for threat analytics and ability to visualize attacks in the detection and mitigation of phishing and malware to hacking and data breaches.  If you would like to learn more about how LookingGlass uses Big Data for digital risk management to identify and prevent cyber attacks please contact us.

Contact Us