Tuesday, November 18, 2014

Firewall scalability

I'm sure in every environment there are moments, where traffic grows and grows until it reaches some limitations. It could be the ISP link or the proxy throughput or maybe the firewall.
In that case, the IT or OPS team has several options to resolve the problem. Of course most of the short-term options involve reducing the traffic generated either by telling users to stop using internet for non-business related purposes, or block websites that are not needed for work (white-listing).
But lets have a look at more technical options, which also tend to be long-term solutions.

Firewall upgrade

This is quite common choice by many companies, who just throw money at the problem. Buying newer and faster firewall is surely easier than trying to re-engineer the network. Migration is also quite simple as it involves configuring the new firewall and just replacing the existing one (with roll-back possibility).

But as it seems, there are also limitations here. For all the Cisco fans, the firewalls currently on the market from your favorite vendor can do max 15Gbps as an appliance and 20Gbps as a service module. As these values are theoretical, I wouldn't expect them to be reached in real-world situations.

Now for those who are able to consider other vendors, Fortinet announced a carrier grade firewall FortiGate 5000, which can deliver more than 1Tbps firewall throughput. Of course that's just a marketing statement, as it's a sum of all the blades, which  can deliver 40Gbps each.

There are also some tricks with using firewalls in parallel, but synchronizing state between all the units might be a challenge. Some vendors tried it with dedicated link between 2 units, others tried it with multicasting the state changes, but effectivity of such solutions was decreasing with each unit and number of flows that were being passed through them.

Firewall bypass

Although firewalls are limited by their inspection ASIC chips, that need not just to analyze the packet headers but also keep state information of each flow, switches with forwarding ASIC chips are much faster when doing just forwarding.

So in some companies, engineers though about this fact and came up with the idea to only inspect the relevant packets to keep the state information and the rest of the packets can be just passed on. 

So they send all the TCP packets with SYN, RST or FIN flags set (including any non-TCP packets) to the inspection unit (can be a firewall), while the rest of the packets can be forwarded to their destination directly.

This idea called "fast-path" was also adapted in SDN and virtual networks, as with OpenFlow 1.3+ the packets can be easily redirected to inspection device, which can instruct the controller to drop the flow if it doesn't match the security policy.
Despite the fact that many vendors currently support only OpenFlow 1.1, many of the are already considering support for 1.3 or have announced switches supporting it (like Brocade).

With such a solution and data flows which have lots of packets, traffic speeds can be much higher than any hardware firewall appliance can offer in near future.
But still the limitation exists on the speed of the SYN/FIN/RST flag containing packets processing and also on the forwarding speed of the network. Also this idea is based on the fact that most of the traffic is TCP based, as for other protocols the conditions to detect when flow is started or finished differ. Plus what is also not shown on the picture is the feedback necessary from the firewall to the router to allow only existing flows.

Firewall re-location

With all the ideas described above, inspection was happening on the edge of the network (based on best practices for firewall deployment). So the idea of doing computation intensive tasks like packet inspection on one centralized system restricts the performance of throughput with limitation of hardware performance of this system.
As the general solution to this limitation is usage of parallel computing, you can also see that vendors tried it by building blade chassis designs. Next step was to virtualize the firewalls and move them closer to the data-flow sources, but the most scalable solution is to have it exactly at the source, so either end-point firewalls or server/VM distributed firewalls.
As flows originate or terminate at each VM, firewall inspecting traffic for that VM would only need to track these flows and don't have to synchronize with other firewalls. Of course if VM moves, firewall has to move too. And in respect of performance, with VMs there is a limit of how much data it can send out, and the more it sends the more inspection firewall has to do. But firewall and the VM share the same CPU and memory resource, so the system would self-regulate if firewall can't keep up with the data being sent out.

This all sounds like the ultimate scalable solution, but there is a dark side to it: Management. With large amount of firewalls, the configuration of each would be quite time consuming. Automation or VM profiles is the usual answer, but for that the way how network security engineers or administrators operate has to change. Just consider troubleshooting connectivity problems when you have 1000s of devices generating logs and these devices move around and might not be able to reproduce the problem.


So despite the fact that there are solutions to scale the firewall throughput to the sky, there are many considerations to be made on the way there.
From the type of data-flow patterns up to the administrator's skill-set, before the sky is reached all these obstacles have to be dealt with.
Just as there is a nice blue sky, there is also a deep dark rabbit-hope into which Alice can fall.