Insights on Building Scalable Cybersecurity Systems
Today, enterprises are faced with a plethora of network security solutions attempting to address requirements including higher throughput, as well as advanced threat detection and mitigation. The challenges also require easy deployment across virtual and non-virtual infrastructures while also being cost-effective.
Achieving a solution across those diverse and often competing requirements can be a challenge. In this blog, we will provide insights and suggested best practices addressing how organizations may build secure network processing systems by introducing new approaches and their advantages.
4 Key Network-Based Cybersecurity Considerations
1. High Throughput
The fundamental challenge with cybersecurity in today’s networks is that the amount of processing required for advanced cybersecurity detection and mitigation on a single packet or flow as it passes through the network fabric is increasing while network line rates are also increasing, resulting in less time to process the network traffic without impacting network latency.
For a modern 4 gigahertz CPU, without taking into account pipelining and branch protection, the CPU can execute 4,000 clock cycles per microsecond and execute one instruction per clock cycle on a single core. For scale, a simple C program that prints “Hello World” can take about 1,000 to 10,000 clock cycles to perform.
Conclusion: As network speeds approach 40 gbps and above line rate, modern CPUs will struggle to execute the necessary cybersecurity protection code without impacting throughput or latency.
Figure 1: Per Packet Latency by Line Rate
2. Network Traffic Variability
Given the nature of different network protocols, as well as deliberate malicious manipulation of network traffic, cybersecurity systems must handle unexpected data in packet contents and headers.
Malformed network traffic can be the result of poorly implemented systems, inadvertent configuration problems or deliberate malicious changes to protocols, or applications or systems communicating over the network.
Conclusion: Cybersecurity systems must handle traffic variability no matter what reason where that processing typically requires greater clock cycles, as more decisions need to be made.
3. Dynamic Management
Cybersecurity systems must handle behavioral changes in detection and mitigation logic to meet the changing nature of network security.
Conclusion: Systems must be provisioned and updated as conditions change without downtime, without impact to their service and with minimal latency.
4. Cost Effective
As networks expand to the cloud and hybrid environments – where it may not be possible or desirable to deploy specialized hardware solutions – the cost of solutions to deliver on the other highlighted requirements is a key driver in the effectiveness of a solution.
Conclusion: Where it makes sense, cybersecurity systems can leverage off-the-shelf hardware acceleration to make the solution more cost effective, or to reach performance levels that are not achievable with pure software.
Cybersecurity Programming Models
One of the key factors in how cybersecurity systems address their requirements is on how the system is designed and programmed. The Programming Machine Model is a model of computation and how a programmer’s instructions in a high-level language such as C, C++, or Java are translated into low-level instructions.
Below are 4 different models and some of the trade-offs that influence one model choice over another.
Model 1: Traditional Register Machine (e.g. C/C++; C Calling Convention)
- Variables can be statically allocated but are more often allocated on the call stack.
- Separate allocation of memory using malloc(); programmer has to manage memory.
- Procedure calls involve storing register state onto the stack and then restoring them on return to the calling procedure.
- Well understood, relatively fast
- Most cost-effective when measured in programmer knowledge and time
- Cost-effective hardware acceleration is possible using LLVM backend targets, i.e. eBPF
- Having to manage memory and call stack impacts security severely
- Has historically been vulnerable to network variability due to C stack and buffer overruns
- Memory management is hard and detail-oriented, another source of variability issues such as memory leaks
- Dynamic behavior requires the implementation of some level of interpretation, magnifying the above issues
The following diagram shows a simplified flow of executing add_this(add_this(2, 1), 3) in C/C++. In particular, data and execution state share the same stack – the parameters and return values are interleaved with the return address in the stack frame.
Model 2: Variable Machine
- Statically allocated variables
- All operations accept these variable addresses as arguments, and the result is put in a variable address
- Strongly typed; operations are not polymorphic. A 16-bit addition will not accept anything other than 16-bit variables
- Procedure calls and returns are explicit; calls store return values in variables that are then used by returns
- Lack of registers and static allocation can make this far more secure than a traditional register machine
- Limited dynamism due to static allocation, and much lower performance due to lack of registers
Diagram: The next diagram shows a simple rendering of how a variable machine would execute add_this(add_this(2, 1), 3). All variables are statically allocated, and return locations are stored in return variables. This approach can be more secure than the C/C++ stack, but takes a lot more memory and is costly in performance due to the lack of data locality.
Model 3: Stack Machine
- Dual-stack (data stack, return stack)
- Separation of code addresses on the return stack, from the data stack.
- All operations take values from the stack as arguments, and place return values onto the stack
- Single value type per stack; some stack machines have a separate floating point stack
- Code is extremely dense, as no operands are required for most operations
- Separation of return and data stacks improves security considerably
- Code density allows the data and program logic to stay in the CPU cache
- The security of a Stack Machine program can be verified algorithmically due to most operations occurring upon the stack with very little side effects, similar to functional programming
- Extremely fast, with interpreted instructions executing in 3 clock cycles or less, due to most opcodes consuming operands from the stack rather than memory addresses or variables
- Most compilers and languages are optimized for register machines; different model of thinking than most
- This is mitigated by ongoing research and development of stack-based targets and intermediate languages, e.g. WebAssembly.
- Most compilers and languages are optimized for register machines; different model of thinking than most
Diagram: The following diagram shows a Stack Machine that splits the Call Stack and the Data Stack. This offers more security than the C/C++ Stack does, as return addresses do not share the same stack as the program data.
Model 4: Vector Machine (GPUs)
- Executes the same operations on many pieces of data (vector), in parallel
- Generally optimized for floating point numeric types, due to graphical focus
- Highly optimized for tasks that are executed in sequence, with no deviation or branches
- Very poor at branching code; code that requires a decision made that determines what code to execute next
- Extremely high throughput with repeatable and pipelined processes
- Very poor at branching logic, where decisions need to be made on every item; concurrent programming is hard for most programmers
Diagram: The final diagram shows the setup of a vector pipeline that performs two vector additions on multiple data concurrently. While the throughput of a vector pipeline is amazing, the setup takes time, and the latency of the pipeline can be problematic for timely detection.
Which Model is Best?
A combination of models works best; effectively leveraging each where they shine. The following scenarios show where each would work best:
- User-Facing Applications and Server APIs: For leveraging other people’s code, whether internal or open source, the C/C++ Calling Convention and Stack offers the best amount of compatibility. The application would have restricted exposure to potential attacks, and so it is generally safe to use a more general purpose and lower security programming model.
- For dynamic and secure packet processing with good performance, a Stack Machine is the best approach due to:
- Code Density – Stack Machine instructions are very small and entire programs can fit in a CPU cache
- Separation of Data and Call Stacks – the likelihood of a buffer overrun attack is much less due to this separation
- Dynamic Management – Stack Machines are safer to execute dynamic rules and code on, due to stack separation above, as well as the capability to programmatically assess the potential inputs and outputs
- Execution Speed and Latency – Instructions are small, simple, and very fast, allowing packet transforms in a minimal amount of clock cycles
- For scenarios where security is paramount, and the high-cost in performance and lack of dynamic behavior is acceptable, the Variable Machine is best due to:
- Static Allocation – Variables are always going to be known sizes and quantities
- No Memory Addressing – No pointers that can be suborned
- No Call or Data Stack – Calls and returns are to variables which are known ahead of time
- For more complex security programs that are required nowadays where the machine has to execute complex tasks, a variable machine programming model has performance limitations if the static variables don’t map to registers and automatically spill to the heap; therefore, on balance a stack machine programmed correctly to effectively take advantage of the stack programming model will execute significantly faster without compromising security
- For transforming packets and applying the same transform repeatedly, a Vector Machine would work best.
- When the same transform needs to be performed on many packets, with a minimum of decision making, a Vector Machine offers high throughput
- If the detection logic can be done sequentially and is simple, a Vector Machine makes a good platform for detection
- A Vector Machine would work best in combination with a Stack Machine, acting as a pre- or post-processor to more complex branching logic
- Packet header extraction or matching
- Mass packet transforms determined by other means
The concepts outlined in this blog should all be taken into consideration when designing and building a high-speed, high-security, and dynamic network processing platform. Without exploring all of the potential approaches you might miss out on a key method that could prevent network latency issues.
If you would like to learn more about LookingGlass security products, contact us.