DoS Madness! Part 1: Memory Management / Behind Your Firewall
Alright, so here’s the bottom line. Your site is down. It’s your job not only to fix it this time, but also, to keep this kind of stuff from happening. Well, a DoS Attack in any of its varieties will rely on a single tenet: your service (www.yoursite.com) relies on a successive chain of processes and devices to generate, package, and deliver its content to your clients. A DoS attack is, in practice, any sort of attack that exploits one or more of these subsystem’s vulnerability to being overwhelmed. It then follows, that to reduce the potentiality for a successful DoS attack, one must reduce the ways in which your site’s subsystems will be overwhelmed.
With that thought in mind, let’s take a look at a couple different DoS attacks and the subsystem they target. The next series of articles will break down DoS types by the general subsystem they exploit.
Part 1: Memory Management / Behind Your Firewall
One type of DoS attack that takes advantage of poorly configured web servers is slowloris. This attack opens as many connections as your web server can handle, then keeps the connections open, with occasional
slowloris: hey.
webserver: what?
slowloris: ...
webserver: ...
webserver: are you still there?
slowloris: hey...
webserver: what?
slowloris: ...
statements.
Slowloris can be mitigated by rate limiting incoming connections to Apache and / or using a non-vulnerable front end web server, such as nginx.
Another tool that can be used maliciously is Google’s skipfish. This is a security scanner that can double as a load tester. Basically skipfish will form URLs based off of crawled data and dictionary based guesses. Used maliciously, it can overwhelm a poorly deployed LAMP stack. These servers may run out of memory and crash.
When it does, there may be an OOM_Killer message in your /var/log/messages file as a final, “OMG-help-me dump” right before reboot. Memory is a scarce resource in a system, and to improve performance, the Linux kernel will overcommit RAM to the numerous processes that request an allocation of memory space. Read more about the Linux kernel’s overcommit behavior here. In summary, it’s kind of like simultaneously sitting at five different $5 dollar black jack games with $20 in chips. Most times, a couple games will cash out, releasing chips back into the resource pool, and there is no net loss. The kernel runs faster, and everybody wins. That is, until traffic increases beyond what the system can handle, and everybody loses. When this happens, the kernel has to run from at least one table (and an angry pit boss). OOM_Killer is a program that was designed to handle exceptions to a linux kernel’s overcommit behavior and uses algorithms to determine which process to kill.
A particularly amusing analogy by Andries Brouwer describes OOM_Killer thus:
An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.
So if you went to the win.tue.nl link I showed you, you’ll note that like a scared gambler, the kernel (after 2.5.30) can be instructed to be more reserved when allocating memory. The values in the below files will moderate this behavior.
echo 2 > /proc/sys/vm/overcommit_memory #this will tell the kernel to never commit memory over the swap space and a certain fraction of physical memory.*
echo 0 > /proc/sys/vm/overcommit_memory #(default) the kernel will do what it thinks is best
echo 1 > /proc/sys/vm/overcommit_memory #this will tell the kernel to go hog-wild and never refuse an app memory
*This fraction is defined in “/proc/sys/vm/overcommit_ratio” – the default value is 50 (=50%
But that’s not the cure-all to memory management. That just dictates kernel behavior. App behavior is another story. Here, you have to ensure your app never asks for more memory than your system can afford to give out. Now this is going to differ based on what type of server it is, but the most commonly exploited servers (from my perspective anyways) are LAMP stacks. And of the LAMP stack, the two most common reasons why OOM_Killer is invoked is:
Too many Apache worker processes. Take a look at your MaxClients declaration in your httpd.conf file. Now, looking at the MPM (by default, it’s usually prefork, which means 1 thread per process), multiply the number of processes for your MaxClients by the average size of your Apache processes, and you’ll have the total memory allocation for Apache at full load. Naturally, you do not want this figure to be anywhere close to your physical RAM limit….
For more on LAMP stack tuning, please refer to the following links:
http://httpd.apache.org/docs/2.0/misc/perf-tuning.html
http://dev.mysql.com/doc/refman/5.0/en/server-parameters.html
http://www.interworx.com/forums/showthread.php?p=2346
http://www.ibm.com/developerworks/views/linux/…
Finally, you may wish to consider using a lighter web server, or a reverse proxy cacher to optimize performance. The bottom line is, when faced against an array of incoming attempts to overwhelm and disable your backend applications, you want your system to be robust enough to gracefully handle what it can, and redirect or ignore the rest.
In Part 2: Socket Management / Behind Your NIC, we’ll be looking at SYN Floods and other, non-port saturating attacks. Now as this series progresses, there will be some overlap in mitigation techniques that will impact more than one type of DoS attack. As long as these correlations are mutually beneficial, we’ll be just hunky-dory.
Slowloris and You
UPDATE: 20090826 – Corrected typo in “Slowloris and You.” It used to say “Slowlaris and You.” I keep getting slowloris confused with my nickname for “Solaris.” =D
Back in July, http://ha.ckers.org/slowloris/ published an exploit against Apache and other web servers (go to the link for further) that takes advantage of multi-threaded applications. It works by tying up web server threads with partial HTTP requests, then sends TCP handshakes to keep the socket open. In general, multi-threaded web servers such as httpd, apache, and apache2 are vulnerable. IIS and most proxies are not vulnerable
CERT suggested using iptables to rate limit incoming port 80 requests. In general, this should be fine for many applications, though CERT has warned that some large clients behind NAT’s may be affected and thus the hitcount/time ratio should be adjusted according to your needs.
http://www.funtoo.org/en/security/slowloris/ offers tips on mitigating this attack by enabling delayed binding on hardware load balancers.
In short, it appears as though the consensus mitigation method involves connection restrictions in the form of iptables or apache modules (most are of limited value, frankly), or shielding the web servers behind load balancers (such as HA-Proxy).
Dynamic iptables – “Flexible (and fun)”
Have you ever said to yourself that there should be a tool to do x, start building a tool to do it, then about halfway through your little project, somebody glances over your shoulder and says to you “hey, I use a tool like that, it’s called y, you should check it out,” so you do, and that tool is far more comprehensive and well built than the one you were working on?
Well this isn’t one of those times, because this tool hit me from left field while I was researching ways to mitigate a DDoS attack. Though there are many, many ways to do it, if all you have is a Linux box facing the world with nothing to hide its private parts except iptables, then this “flexible (and fun)” toolset is another weapon you can deploy when you get that 2:30AM call saying “our website’s down and I think it’s being DDoS’d.”
The tool is a simple set of scripts that make adding and removing specific IP’s quick and simple. The main site of the author is at http://www.ibm.com/developerworks/library/l-fw/, or is available (hosted locally) here.
Once installed, you can simply ban/unban an IP by typing ipdrop {IP ADDRESS} {on|off}
While perusing this thread at webhostingtalk.com, member dynamicnet mentioned grep-ing for ridiculous levels of SYN_RECV ‘d connections (this is indicative of a TCP SYN Flood attack) and generating ipdrop commands for quick banning of a SYN Flood-ing IP’s. Though you may accidentally drop one or two legitimate IP’s (have a rule already in place so you don’t ban yourself out of a remote box), you’ll likely get the bulk of the attacking IP’s.
Use netstat -n -p|grep SYN_REC | wc -l to count how many SYN_RECV connections you have.
Use netstat -n -p | grep SYN_REC | awk '{print $5}' | sort -u | awk -F: '{print "ipdrop "$1 " on"}' to generate code to ban IP’s in SYN_RECV status.
Use cat /root/.dynfw-ipdrop |awk -F: '{print "ipdrop "$1" off"}' to generates code to “undrop” those IP’s.