Security Engineered: engineering

Showing posts with label engineering. Show all posts

Wednesday, September 28, 2016

Network Segmentation - Access Filters

Creating access filters in live networks

Every vendor or consultant presents fantastic solutions, that would work in ideal environments they think every customer has.
The truth is, only the newly designed networks are like that, while the rest is more heterogeneous with many exceptions and special fringe cases.
And as it is more common to have network engineers, who don't speak to the application people, lack of know-how about the traffic flows in their networks has serious implications.
With inexperienced architects or engineers, this usually ends up with even more non-standard environment or in worst case with dysfunctional network.

Another issue is the necessity to do network re-design in order to re-route traffic via filtering devices, which often can cause downtime or outages when implementing such solution.

So in this post, I'll share a technique of creating access lists for network separation without causing major interruptions or needing downtime for these changes.

There are many aspects (like performance, location, support), that are outside of the scope for this post, but have to be considered before using filtering in production networks.

The examples below are based on the idea of a L3 Cisco network with several internal VLANs (where the ACLs would be applied), but the method works for L2 networks (applying ACLs on ports or uplinks) as well.

Phase 1: Observation

As primary problem of most network engineers is not knowing what packets flow through their network, first step has to be to find that out.
And this is accomplished with creating an access list, that would allow any traffic through and log the hits.


ip access-list extended ThisVLAN100

1000 permit ip any any log

To prevent overload of the log server, it's better to specify known traffic flows already into the ACL. Good example is DNS traffic, that is very common in server as well as in user networks with high probability of occurrence.

Phase 2: Adaptation

In highly populated networks, there would be loads of traffic and logs would be growing faster than a human can read them. So the goal of this phase is to minimize the log growth by inserting ACL entries with most common traffic patterns:


900 permit ip host <some host> any

This phase can have many iterations, where ACL entries can be adjusted to be more generic or specific (depending on security level that is acceptable).
Time spent in this phase also depends on the probability that all devices would perform all the traffic patterns (e.g. monthly backup or data upload)

Phase 3: Conclusion

With ACL populated with identified traffic flows, that are acceptable/expected, the finalization is to change the

no 1000
1000 deny ip any any log

It is always advisable to log the last entry, as when some application changes in the network, this would provide good source of information for troubleshooting.
If Adaptation phase was successful, there should be almost no hits to this rule.

Maybe I should mention, that in internet-facing networks this last rule would be looking a bit differently and there would be other rules for logging anomalous traffic:

999 deny ip <other internal networks> <protected network> log
1000 permit ip <protected network> any

But I hope you got the picture how the process works to establish ACL filters in production networks without major impact.

Next steps

Depending on the network architecture, this process can be automated either by monitoring the logs and generating changes to be applied or by using "canary deployment" and replicating the result to the rest of the network.

The knowledge gathered in phase 1 and 2 should also help the network team to understand the traffic patterns better for future design and improvement projects.

Thursday, July 9, 2015

DoS protection solutions

Most of the companies ignore the fact that their services can go down, and therefore not even consider using protection against DoS attacks.
So in this post i'd like to analyze what are the potential attack areas and what ideas are there to resolve it.
Lets exclude the attacks that use specific vulnerabilities of a product to deny the service and focus on brute force attacks that flood the target with excessive valid service requests.

Statement of the problem

As the attack is using valid service requests, it's hard to distinguish them from normal user requests. This means that whatever action is taken, normal user requests have to be served in reasonable time.

Secondly the service providing solution has to be capable of dealing with normal service requests. This means, it has to be elastic enough to scale up if necessary to serve all incoming requests.

The most common bottlenecks (on the side of a customer or enterprise) are:

Server resources - system(s) that provide service might not have necessary resources to deal with user requests (no network bandwidth; not enough memory; CPU over-utilized or other)
Firewall resources - filtering and inspection as well as tracking of all the active flows need resources as well (not as many as a server).
Internet router - although not very probable, but still router might not be able to deal with that many packets per second
Internet link - the most common problem, where the link to the ISP is not sized properly to deal with growing user service demand, might lead to degradation of service response time.

There surely are other potential bottlenecks, but the list above is to be considered in every case (independent of the service, location or team).
Mitigation of these bottlenecks is the key to deal with DoS attacks, and while scaling is always an option, it might not be the most efficient one. Plus the costs grow exponentially, so at certain point of time, it becomes important to choose an alternative solution that scaling.

Potential solutions

Excluding the possibilities of re-engineering the application (providing the service) or buying bigger hardware or upgrading the internet link speed to deal with potential distributed DoS attacks (which could be 10Gb/s or more nowadays), let's see what ideas there are.

There are 3 functions that are part of each solution:

Detection (distinguishing what is a valid service request and what not)
Protection (blocking invalid service requests without impacting valid ones)
Service (providing the actual service)

Each type of solution differs based on who is responsible for these functions.

In-house DDoS prevention

This solution uses in-house detection capability to spot the DDoS attack and then request support from the internet service provider to block the source IP address(es) for limited amount of time.Having a good security event management system with all the possible event sources helps to identify attacks early and allows to provide much more reliable response (false-positives mean lost clients).

Response can be automated by using provider's service API or standard routing protocols (like BGP Flowspec) for blackholing the source, or it could be manual by reporting abuse to the appropriate ISP contact point.

There are many vendors who provide such solutions that include DPI detection as well as signalling, but there also needs to be a contract in place with service provider to support such service.

This table should summarize the location of the DDoS protection functions:

Function	Customer	DoS protection provider
Attack detection	Most of the detection
Attack prevention	Signaling only	Most of the protection
Actual service	The service is provided by customer

Service gateway

Principle of this solution is pre-filtering all the requests via a gateway service, where only valid requests fulfilling specific set of rules (like max 10 requests per client; valid session should last longer than 1 minute; etc.) would be forwarded to the real server.

Depending on the rule-set, customer's server would receive only valid requests and would not have to deal with excessive traffic.

In this case, the location of the detection and prevention functions is at the service provider, so all the traffic ends there. But the responsibility of the service operation is still up to the customer.

Function	Customer	DoS protection provider
Attack detection		Most of the detection
Attack prevention		Most of the protection
Actual service	The service is provided by customer

It's important to note that these gateways are either generic (for IP or TCP packets) or highly specific for certain type of application (mail gateway; dns gateway; web application firewall;..).

CDN service

Although this is highly specific solution and requires some cooperation with the service provider, for common services like data or content distribution, this could be very effective.

Content distribution network (CDN) providers have large infrastructure geo-distributed and built for huge traffic flows. For some of them even large distributed DoS might look like minor increase in the normal traffic level.

Principle here is quite simple: customer uploads all the data to the CDN service provider, where it would become accessible to the clients.

This solution moves all the functions to the external service provider, who has to deal with the attacks and guarantee service to the customer.

Function	Customer	DoS protection provider
Attack detection		Most of the detection
Attack prevention		Most of the protection
Actual service		Service is provided by CDN

Conclusion

Despite having lots of great tools that promise miracles in preventing DDoS attacks, it always comes down to a specific solution for specific customer. Whether rules need to be tailored and constantly adapted for the first type of solution; or gateway needs to be adjusted to deal with non-standard protocol behavior; it all comes down to the skills of the engineers dealing with the solution.

In hands of capable engineer, it would make the promised miracles come true; in less capable hands it can make the service very unreliable or vulnerable to distributed DoS attacks.

Tuesday, October 21, 2014

Importance of free training appliances/software

I try to get any opportunity I can to keep my technical skills up-to-date, from reading articles down to getting virtual versions of the software/appliance and play with it. While major vendors provide technical documentation and various papers for free, there are still some who hide their documentation behind authentication wall.

But that's not the prime concern, as engineers prefer something tangible, something to play with..
And here comes the idea of virtual labs that some vendors provide for money or rental labs from various training and certification companies. But all that still costs, and this is not very viable for home-use to learn the technology or get acquainted with the management interface.

With the recent advancements of virtualized environments came network and security virtual appliances, which could be re-purposed for training or testing. There are some vendors who offer 30 day validation, but honestly.. in real world people have to deal with many tasks and there is no continuous 30 days time, when one can focus on evaluating the product and perform all the tests and analysis. Also such evaluations are not always well planned with all the test cases identified before performing evaluation, so some cases might be missed and PoC has to be re-built to do them later.

So when actually engineers need or want testing/training VMs:

when selecting new network/security elements for purchase (e.g. shortlisting for evaluation)
when evaluating compatibility of various elements/vendor systems
when preparing for new job position
when testing new management/automation systems/monitoring software
when developing code for the above
when preparing for certifications
for collection purposes

All of these reasons would benefit a vendor (if he provides free evaluation VMs) in the following way:

customer's engineers already know the product(s), so no barrier to sell different products(or from different vendor)
compatibility would be evaluated without shipping try+buy equipment and problems would be identified (and resolved) quite early, especially if some bounty program would exist
with more engineers available to the customer on the market, they might grow/expand faster (=> more equipment can be purchased and deployed)
same applies to NMS/SIEM/automation software configuration
developing code in free time is also not very easy if one needs Cisco Nexus 9000 to actually test it.
although certifications are also source of income, but by lowering the cost of LAB preparation, it might motivate more people to do it.
some people collect stamps, while others might collect VMs.. but really it's more about being prepared for one of the reasons above..

There are more benefits to mention, but this surely is sufficient for any product manager or marketing director to stop for a while and think about it.
Some of the vendors already did it, and they chose this strategy in order to win hearts and minds of engineers, who otherwise would not have the opportunity to find out how good these products are.

Conclusion

Dear Vendor,
whether you want to introduce new product or gain larger market share, providing free VMs of your products (with limited performance of course) might bring you more advantages than risks.
Even when you are having problems recruiting good pre-sales or professional services engineers, the availability of free VMs for preparation or training on your products would expand your hiring selection choice in the long run.
What is also important to mention, engineers who would know your products (and like them) could indirectly be supporters or enablers of potential sales opportunities wherever they would work.
So please consider this when adjusting your product portfolios and creating your marketing strategies.
Respectfully yours,
Security Engineer

Wednesday, July 16, 2014

Test Driven Design in firewall engineering

After doing one Berkeley course on BDD and TDD in software engineering, I got a very interesting idea for new kind of security device. This article would explain a bit about how this idea could work and what is the use-case for it.

Background

Firewall rules design is a skill that is quite hard to get right and quite easy to make mistakes in. If one just keeps on adding rules, firewall might end up with 10000 or more rules and get difficult to maintain or in worst case run out of memory for the rules.

On the other hand if one keeps on aggregating, firewall would end up permitting unwanted traffic and this won't be spotted until it actually causes some damage.

So the best spot is somewhere in between these two cases, by ensuring holes to be kept minimal but aggregating rules that can be merged.
This either can be done by doing careful planning and analysis, or it requires very good service monitoring to spot if something stops working or something suddenly becomes allowed that shouldn't be.
In most cases the first one is performed (if of course there is a brave soul to dare to touch a working system), but with rise of continuous deployment and automation second choice is something what might be more handy.

In software development BDD/TDD is done by writing tests before actual coding takes place, so that the "user story" is failing before the coding and becomes green after the code is implemented.
Also there are 2 types of tests that need to be done:

positive (how this feature should work)
negative (how this features should not work)

And what is so great about testing in software development area is, that they developed a language to describe them. In the course I've been practicing this with Cucumber, which describes the expected behavior of a application interface in very readable way and makes testing more fun.

Idea description

Now for the idea to work I would require a IDS-like system, which has following capabilities:

to receive any packets that come to the firewall as well as leave the firewall
to send packets with any kind of source/destination on both sides of the firewall

As IDS systems are already able to do that, I don't think there's a problem to build such a system:

Next part is to create a language for writing the tests, which would describe the expected behavior of the firewall and validate it it timely fashion.
I've written the cucumber tests to show what can be done, but it requires quite some coding to implement this type of test conditions, but it illustrates how such tests could look like.

Feature: Test HTTP protocol 
  Webserver 10.0.0.1 should be reachable on HTTP port from outside

  Scenario: External user allowed connection
    Given there is "HTTP" server "10.0.0.1"
    When I send "HTTP" packet from "Outside" to "10.0.0.1"
    Then I should see "HTTP" packet on "Inside" to "10.0.0.1"

  Scenario: External user dropped connection
    When I send "HTTP" packet from "Outside" to "10.0.0.2"
    Then I should not see "HTTP" packet on "Inside" to "10.0.0.2"
    When I send "SMTP" packet from "Outside" to "10.0.0.1"
    Then I should not see "SMTP" packet on "Inside" to "10.0.0.1"

Theoretically all the tests that are currently done manually by scanning for ports or analyzing firewall logs and sniffer data could be then automated and repeated any time it is necessary (without disturbing the security engineer :).

So with these capabilities I am able to generate packets and see what can pass through the firewall and what not, but it still can be a bit problematic if firewall does any kind of intelligence or behavior analysis and starts blocking IP addresses that are used for testing.
Another problem might be that server might process the requests and fill up the job queue or tcp stack with waiting for response packets.
So either the solution would be able to stop the packets automatically reaching anything outside the firewall, or the tests have to be written in a way to not block or over-utilize any resources.

With many vendors increasing support for various API interfaces, this could theoretically also be implemented directly on the firewall, but with firewall clusters or farms, this might not be very practical.
And of course the saying "trust is good, verification is better" still applies here.

As to SDN or more specifically NFV, this service would be ideal candidate to use for verification of changes or control software validation.

Use-Cases

As many vendors (that might grab this idea and build something) are concerned about why would customers buy something like this, let's think about some cases that would demand this idea:

Continuous deployment scenario

With applications being changed several times a day, there might be a need to adjust network elements to address the changes. For example when application changes the type of database it needs, resulting in need for Mysql flow needed to a newly spawned (or existing) database VM.
As the change has to happen automatically to keep the development cycle short, this would be done by a script and person doing it would want to see if firewall change had any impact on already permitted flows and whether new flow would not allow too much.

Firewall migration scenario

Upgrades or migrations of firewalls require all rules to be re-considered and probably rewritten. Having a set of tests that would show that new firewall does provide the same functionality and would not cause major issues for support teams. This way migrations would not need extra effort to investigate outages of broken connectivity (as every problem would be blamed on the firewall for months to come), and service status would be visible right after the new or upgraded firewall becomes operational.

Continuous monitoring scenario

Although continuous monitoring is mostly focused on services (customers want service to be up, they don't care about the infrastructure), for troubleshooting and support it might be quite useful to spot what causes the problem. Very often the monitoring is not analyzing all the firewall logs and even if so, simple rule change can generate masses of dropped connections and it might be tedious to see what flows are no longer working.

By performing continuous monitoring of data flows on the firewall would exclude this area from investigation by just looking at any failing tests (as it takes quite some time to search through logs to identify if something was dropped and whether it should not be dropped).