Wednesday, July 16, 2014

Test Driven Design in firewall engineering

After doing one Berkeley course on BDD and TDD in software engineering, I got a very interesting idea for new kind of security device. This article would explain a bit about how this idea could work and what is the use-case for it.

Background

Firewall rules design is a skill that is quite hard to get right and quite easy to make mistakes in. If one just keeps on adding rules, firewall might end up with 10000 or more rules and get difficult to maintain or in worst case run out of memory for the rules.
On the other hand if one keeps on aggregating, firewall would end up permitting unwanted traffic and this won't be spotted until it actually causes some damage.
So the best spot is somewhere in between these two cases, by ensuring holes to be kept minimal but aggregating rules that can be merged.
This either can be done by doing careful planning and analysis, or it requires very good service monitoring to spot if something stops working or something suddenly becomes allowed that shouldn't be.
In most cases the first one is performed (if of course there is a brave soul to dare to touch a working system), but with rise of continuous deployment and automation second choice is something what might be more handy.

In software development BDD/TDD is done by writing tests before actual coding takes place, so that the "user story" is failing before the coding and becomes green after the code is implemented.
Also there are 2 types of tests that need to be done:

  • positive (how this feature should work)
  • negative (how this features should not work)
And what is so great about testing in software development area is, that they developed a language to describe them. In the course I've been practicing this with Cucumber, which describes the expected behavior of a application interface in very readable way and makes testing more fun.

Idea description

Now for the idea to work I would require a IDS-like system, which has following capabilities: 
  • to receive any packets that come to the firewall as well as leave the firewall 
  • to send packets with any kind of source/destination on both sides of the firewall 
As IDS systems are already able to do that, I don't think there's a problem to build such a system:


Next part is to create a language for writing the tests, which would describe the expected behavior of the firewall and validate it it timely fashion.
I've written the cucumber tests to show what can be done, but it requires quite some coding to implement this type of test conditions, but it illustrates how such tests could look like.
Feature: Test HTTP protocol 
  Webserver 10.0.0.1 should be reachable on HTTP port from outside

  Scenario: External user allowed connection
    Given there is "HTTP" server "10.0.0.1"
    When I send "HTTP" packet from "Outside" to "10.0.0.1"
    Then I should see "HTTP" packet on "Inside" to "10.0.0.1"

  Scenario: External user dropped connection
    When I send "HTTP" packet from "Outside" to "10.0.0.2"
    Then I should not see "HTTP" packet on "Inside" to "10.0.0.2"
    When I send "SMTP" packet from "Outside" to "10.0.0.1"
    Then I should not see "SMTP" packet on "Inside" to "10.0.0.1"
Theoretically all the tests that are currently done manually by scanning for ports or analyzing firewall logs and sniffer data could be then automated and repeated any time it is necessary (without disturbing the security engineer :).

So with these capabilities I am able to generate packets and see what can pass through the firewall and what not, but it still can be a bit problematic if firewall does any kind of intelligence or behavior analysis and starts blocking IP addresses that are used for testing.
Another problem might be that server might process the requests and fill up the job queue or tcp stack with waiting for response packets.
So either the solution would be able to stop the packets automatically reaching anything outside the firewall, or the tests have to be written in a way to not block or over-utilize any resources.

With many vendors increasing support for various API interfaces, this could theoretically also be implemented directly on the firewall, but with firewall clusters or farms, this might not be very practical.
And of course the saying "trust is good, verification is better" still applies here.

As to SDN or more specifically NFV, this service would be ideal candidate to use for verification of changes or control software validation.

Use-Cases

As many vendors (that might grab this idea and build something) are concerned about why would customers buy something like this, let's think about some cases that would demand this idea:

Continuous deployment scenario

With applications being changed several times a day, there might be a need to adjust network elements to address the changes. For example when application changes the type of database it needs, resulting in need for Mysql flow needed to a newly spawned (or existing) database VM.
As the change has to happen automatically to keep the development cycle short, this would be done by a script and person doing it would want to see if firewall change had any impact on already permitted flows and whether new flow would not allow too much.

Firewall migration scenario

Upgrades or migrations of firewalls require all rules to be  re-considered and probably rewritten. Having a set of tests that would show that new firewall does provide the same functionality and would not cause major issues for support teams. This way migrations would not need extra effort to investigate outages of broken connectivity (as every problem would be blamed on the firewall for months to come), and service status would be visible right after the new or upgraded firewall becomes operational.

Continuous monitoring scenario

Although continuous monitoring is mostly focused on services (customers want service to be up, they don't care about the infrastructure), for troubleshooting and support it might be quite useful to spot what causes the problem. Very often the monitoring is not analyzing all the firewall logs and even if so, simple rule change can generate masses of dropped connections and it might be tedious to see what flows are no longer working.
By performing continuous monitoring of data flows on the firewall would exclude this area from investigation by just looking at any failing tests (as it takes quite some time to search through logs to identify if something was dropped and whether it should not be dropped).