Thursday, July 9, 2015

DoS protection solutions

Most of the companies ignore the fact that their services can go down, and therefore not even consider using protection against DoS attacks.
So in this post i'd like to analyze what are the potential attack areas and what ideas are there to resolve it.
Lets exclude the attacks that use specific vulnerabilities of a product to deny the service and focus on brute force attacks that flood the target with excessive valid service requests.


Statement of the problem

As the attack is using valid service requests, it's hard to distinguish them from normal user requests. This means that whatever action is taken, normal user requests have to be served in reasonable time.

Secondly the service providing solution has to be capable of dealing with normal service requests. This means, it has to be elastic enough to scale up if necessary to serve all incoming requests.


The most common bottlenecks (on the side of a customer or enterprise) are:
  1. Server resources - system(s) that provide service might not have necessary resources to deal with user requests (no network bandwidth; not enough memory; CPU over-utilized or other)
  2. Firewall resources - filtering and inspection as well as tracking of all the active flows need resources as well (not as many as a server).
  3. Internet router - although not very probable, but still router might not be able to deal with that many packets per second
  4. Internet link - the most common problem, where the link to the ISP is not sized properly to deal with growing user service demand, might lead to degradation of service response time.
There surely are other potential bottlenecks, but the list above is to be considered in every case (independent of the service, location or team).
Mitigation of these bottlenecks is the key to deal with DoS attacks, and while scaling is always an option, it might not be the most efficient one. Plus the costs grow exponentially, so at certain point of time, it becomes important to choose an alternative solution that scaling.


Potential solutions

Excluding the possibilities of re-engineering the application (providing the service) or buying bigger hardware or upgrading the internet link speed to deal with potential distributed DoS attacks (which could be 10Gb/s or more nowadays), let's see what ideas there are.

There are 3 functions that are part of each solution:

  • Detection (distinguishing what is a valid service request and what not)
  • Protection (blocking invalid service requests without impacting valid ones)
  • Service (providing the actual service)
Each type of solution differs based on who is responsible for these functions.


In-house DDoS prevention

This solution uses in-house detection capability to spot the DDoS attack and then request support from the internet service provider to block the source IP address(es) for limited amount of time.Having a good security event management system with all the possible event sources helps to identify attacks early and allows to provide much more reliable response (false-positives mean lost clients).
Response can be automated by using provider's service API or standard routing protocols (like BGP Flowspec) for blackholing the source, or it could be manual by reporting abuse to the appropriate ISP contact point. 


There are many vendors who provide such solutions that include DPI detection as well as signalling, but there also needs to be a contract in place with service provider to support such service.

This table should summarize the location of the DDoS protection functions:

 Function CustomerDoS protection provider 
Attack detectionMost of the detection
Attack preventionSignaling onlyMost of the protection
Actual serviceThe service is provided by customer 


Service gateway

Principle of this solution is pre-filtering all the requests via a gateway service, where only valid requests fulfilling specific set of rules (like max 10 requests per client; valid session should last longer than 1 minute; etc.) would be forwarded to the real server.
Depending on the rule-set, customer's server would receive only valid requests and would not have to deal with excessive traffic.



In this case, the location of the detection and prevention functions is at the service provider, so all the traffic ends there. But the responsibility of the service operation is still up to the customer.

 Function  Customer DoS protection provider 
Attack detection
Most of the detection
Attack prevention
Most of the protection
Actual service The service is provided by customer
It's important to note that these gateways are either generic (for IP or TCP packets) or highly specific for certain type of application (mail gateway; dns gateway; web application firewall;..).

CDN service

Although this is highly specific solution and requires some cooperation with the service provider, for common services like data or content distribution, this could be very effective.
Content distribution network (CDN) providers have large infrastructure geo-distributed and built for huge traffic flows. For some of them even large distributed DoS might look like minor increase in the normal traffic level.
Principle here is quite simple: customer uploads all the data to the CDN service provider, where it would become accessible to the clients. 


This solution moves all the functions to the external service provider, who has to deal with the attacks and guarantee service to the customer.

 Function CustomerDoS protection provider  
Attack detection
Most of the detection
Attack prevention
Most of the protection
Actual service                                                        Service is provided by CDN


Conclusion

Despite having lots of great tools that promise miracles in preventing DDoS attacks, it always comes down to a specific solution for specific customer. Whether rules need to be tailored and constantly adapted for the first type of solution; or gateway needs to be adjusted to deal with non-standard protocol behavior; it all comes down to the skills of the engineers dealing with the solution.
In hands of capable engineer, it would make the promised miracles come true; in less capable hands it can make the service very unreliable or vulnerable to distributed DoS attacks.

Thursday, April 30, 2015

Planning network element patches

As it happens, sometimes security engineers need to grab networkers and make them patch all those vulnerabilities that sysadmins keep fixing as soon as they get out.
With systems, this all is quite easy as there are almost no restrictions (besides compatibility), but network elements like routers or switches have limited resources and updates are not broken down to smaller components, but are usually just one big file containing everything from kernel to all the "applications" or processes and supporting utilities.
In this post, I would try to describe the process of planning such upgrade.

Inventory phase

First step is to find out what needs to be upgraded, as in later phases this information is quite important for selecting the software version as well as the process of the upgrade.
Following information needs to be collected about each router/switch/firewal/etc.:
  • Hardware type and version (not just the type printed on the device chassis, but also slot/port count information. For example "Cisco Nexus 56128P Switch" or Catalyst 2950T 24 Switch)
  • Memory information (RAM as well as storage or flash memory size is important)
  • Management IP (or IP address via which the software is going to be uploaded, as pcmcia or console modem options are usually not very fast)
Memory information is needed to find out if the software can run in the memory the hardware provides as well as if the software can be stored on the memory provided. Some devices have only one memory and split it in the two types (like Cisco 7200 routers) , while others have dedicated storage memory and RAM.

Software selection phase

After having all the information from previous phase, we can move on to finding the appropriate software version, that each device can support and contains the fixes needed.
Each software vendor has at least one web-tool that provides the information (or even the software download) needed:
Sometimes the vendors have links directly from security advisories or notifications, but it's not necessarily there, so the safest way to get the software and information about it is via the download pages.

Some vendor make it easy to select the latest version, while others have a set of sub-versions indicating feature upgrades or just patches; standard or extended support; early deployment or limited deployment; etc. Each vendor has a document describing what each part of the version means, and it can also be different for each product series.

Besides having the choice of software to download, there's also release notes or readme document for each version, where the vendor describes:
  • how to perform the upgrade 
  • what are the pre-requisites (which platforms and current software versions are compatible)
  • what new features are introduced and old ones removed
  • what issues/bugs/problems were resolved with that software version
  • what caveats were identified with this version
If the current version is way too old (by 1 or several major version releases), it might be needed to perform several upgrades in order to ensure that configuration is properly translated to new syntax or with new features. This should be described in the pre-requisites in order to ensure trouble-free upgrade. This phase has to be repeated for each of the versions that need to be installed before the latest one can be applied.

With constant change and improvements in the network field, features come and go, so it's necessary to watch out for removal or modification of features used (default deny could change to allow any; or statically configured IPSEC local networks might be auto-negotiated in newer version).

List of bugs resolved is a good source for identifying whether the new version would fix the recent vulnerabilities flowing in the wild. This might help with vulnerability management tickets or anomaly reports that are overdue.

And the caveats are good to know problems that were identified during vendor testing of the new version. When the local conditions are similar as those described in caveat, this might put a stop to the installation of that version (or the upgrade).

Software validation

With all the information collected from previous phases, only very brave people would install the software right away into production.
A lot of companies have labs, where new versions can be tested before installing them into production. In larger data-centers there could be canary elements for testing, where this could be done.
Goal of validation should be to ensure:

  • current configuration syntax is fine under new version
  • all required features are going to work as expected (with the same licences)
  • redundancy mechanisms would work (no timer defaults or protocol defaults changed)
  • monitoring functions get the same format of data as before (no snmp OID or syslog message format  or API changes)
  • migration/upgrade plan is not going to cause an impact (some systems require same version of clustered elements to work)

Whether all this is automated or done manually by verification team with defined validation test-cases, it's up to each company to decide, but what most of the IT managers wouldn't like is to have total outage of core network after software upgrade of central router or switch.

And let's not forget to verify the hash of the downloaded software (if the vendor offers it on the download website), as network elements are the best place for MiTM attacks.

If you know of anything else I missed, let me know and I'll update the post.

Tuesday, April 28, 2015

Event management solution scaling - Practical example

As described in the previous blog post, every software; every server or every appliance has its limits.
Scaling beyond these limits is a task for an engineer to build something that can cope with the loads.
In theory one could adjust the open-source solution and live happily ever after, but in the real world.. well one has to deal with proprietary software or appliances and it's not easy to just migrate or replace it.

For such scenario, I've developed a small program called NFF that forwards the incoming traffic to several configured destinations. Currently it is built to listen on one port and forward it to several destinations, but with different configuration file it can run for several services (e.g. syslog; snmp-traps; netflow)


Note: in current version it only forwards the flows, but later on when protocol decoding is implemented, it would also be able to forward flows to specific destinations based on rules.

Integration would be done by installing this program on the same IP address that all systems send their logs/netflow/data to, and the appliance or software analyzing these would move to a new IP address.

In case the management decides to buy a bigger box or choose different supplier, this can be added to the distribution list during trial period in order to see if it fulfills the needs and expectations.



As I don't have a job where I could test this idea at scale, I hope some of you would provide me some feedback how well it can perform. I already have several ideas how to make it work faster..

Friday, March 20, 2015

Cryptography education

After a long break, I finally got back to writing yet another post (coincidentally, about what I did the last few months).
As cryptography is quite essential part of security engineering I decided to challenge myself by doing online course by Dan Boneh from Stanford called Cryptography I.
Originally I didn't expect that university courses go into much detail of how crypto algorithms are implemented in practice, but this course surely changed my opinion on practicality of university lectures. Of course I should have known that lectures from famous universities like Stanford are  expected to be worth the tuition fees.

Course content

Despite the practical nature of this course, it requires very good understanding of various areas of Math as well as the scientific method of lectures.
This actually was important part of practicality, as some areas of security engineering require or expect a mathematical proof of algorithm or process security. I'm not sure how such proof  would hold in front of an average auditor, but this course surely does good work proving weaknesses of explained cryptography functions.
The first course contains brief introduction with a short refresher in discrete math and goes right into stream and block ciphers (like DES or AES). Later it covers message integrity and hash functions and finishes the course with public key encryption (RSA and ElGamal).
The second course (expected to take place in June 2015) would surely bring more advanced and recent topics like elliptic curve cryptography or maybe digital currency concepts, so there's more to learn.
In most of the topic practical examples were given on how some services or protocols used the cryptography in wrong way.

Programming part

Besides the theoretical part it also included coding assignment, where the tasks were from encrypting content in efficient and secure way to deriving keys from insecure implementations. This experience would surely be very useful in code review or penetration testing work.
All the programming assignments were done in Python, with help of various libraries that offered the necessary crypto algorithms already implemented.

Conclusion

Everybody in the field of security who wants to call himself a security engineer should do this course (and pass), as the knowledge gained is very important in many areas of the security field. Whether one does security policy (to define acceptable crypto algorithms/key sizes etc.); ensures application security (performing code reviews or testing); provides authentication, file or storage encryption or builds VPNs, this course will help to understand what are the implications of choosing one or the other security algorithm and what would be the performance and security impact of these decisions.
With recent openssl vulnerabilities I can't stress enough how this knowledge improves the decision-making whether or how to deal with these vulnerabilities.