AWS Security Blog
s2n and Lucky 13
Great security research combines extremely high levels of creativity, paranoia, and attention to detail. All of these qualities are in evidence in two new research papers about how s2n, our Open Source implementation of the SSL/TLS protocols, handles the Lucky 13 attack from 2013.
The research found issues with how s2n mitigates Lucky 13 and improvements that could be made. These issues did not impact Amazon, AWS, or our customers, and are not the kind that could be exploited in the real world, but the research shows that the early versions of s2n’s mitigations made Lucky 13 attacks tens of millions of times more difficult for attackers, rather than the trillions of times more difficult that we had intended. In any event, the versions of s2n concerned have never been used in production, and improvements that completely prevent the issue were included in the versions of s2n available on GitHub from July 11th , 2015.
The two papers are from:
- Martin Albrecht and Kenny Paterson of the Information Security Group at Royal Holloway, University of London, who have a paper concerning s2n’s initial implementation of Lucky 13 CBC verification, and s2n’s secondary mitigation mechanism using timing delays. Kenny Paterson is also one of the original authors of the Lucky 13 paper.
- Manuel Barbosa (HASLab – INESC TEC and FCUP) has discovered a bug introduced in the process of making improvements related to the first paper, and has upcoming publications in collaboration with other researchers (see below) where they cover the limits of human review and other interesting implications of this programming error, and exploring formal verification solutions to prevent such errors in the first place.
We would like to thank Albrecht, Paterson and Barbosa for reporting their research through our vulnerability reporting process and for discussing and reviewing the changes we made to s2n in response to that research. Both papers are valuable contributions towards improving the protections in s2n and cryptography generally, and we’re grateful for the work involved.
Our summary: s2n includes two different mitigations against the Lucky 13 attack and although the original implementations in s2n were effective against real-world attacks, each could be improved to be more robust against theoretical attacks.
Lucky 13: a quick recap
In February 2013, Nadhem J. AlFardan and Kenny Paterson of the Information Security Group at Royal Holloway, University of London, published the Lucky 13 attack against TLS. Adam Langley’s blog post on the topic is a great detailed summary of the attack and how it operates and how It was mitigated In OpenSSL.
A brief synopsis is that Lucky 13 is an “Active Person in the Middle” attack against block ciphers where an attacker who is already capable of intercepting and modifying your traffic may tamper with that traffic in ways that allow the attacker to determine which portions of your traffic are encrypted data, and which which portions of your traffic are padding (bytes included to round up to a certain block size).
The attacker’s attempt to tamper with the traffic is detected by the design of the TLS protocol, and triggers an error message. The Lucky 13 research showed that receiving this error message can take a different amount of time depending on whether real data or padding was modified. That information can be combined with other cryptographic attacks to recover the original plaintext.
How the Lucky 13 attack interacts with TLS
Fortunately, there are a number of factors that make the Lucky 13 attack very difficult in real world settings against SSL/TLS.
Firstly, the timing differences involved are extremely small, commonly smaller than a microsecond. The attacker must intercept, modify, pass on traffic and intercept the errors on a network that is stable to the point that differences of less than a microsecond are measurable. Unsecured Wi-Fi networks are easiest to intercept traffic on, but are neither fast enough nor stable enough for the attack to work. Wide-area networks are also too inconsistent so an attacker must be close to their target. Even within a single high-performance data center network, it is very difficult to obtain the conditions required for this attack to succeed. With the security features of Amazon Virtual Private Cloud (VPC), it is impossible to surreptitiously intercept TLS traffic within AWS data centers.
Making things harder still for the attacker, attempts to accelerate the attack by trying multiple connections in parallel will contend network queues and can obscure the very measurements needed for the attack to succeed. Our own experiments with real-world networks show an exponential distribution of packet delays in the microsecond to tens of microseconds range.
Thirdly, the errors generated by the Lucky 13 attack are fatal in SSL/TLS and cause the connection to be closed. An attack would depend on neither the client nor the server noticing incredibly high connection error rates. At AWS, monitoring and alarming on far lower error rates than would be required to mount a Lucky 13 attack is a standard part of our operational readiness.
Lastly, in response to the original Lucky 13 research, the CBC cipher suites impacted by the vulnerability are no longer as common as they were. Today’s modern web browsers and servers no longer prefer these cipher suites. At AWS, we now see less than 10% of connections use CBC cipher suites.
Considering the limitations on the attack, it is difficult to conceive of circumstances where it is exploitable in the real-world against TLS. Indeed, some implementations of the TLS protocol have chosen not to mitigate the Lucky 13 attack.
With s2n, we decided to be cautious, as attacks only improve over time and to follow the statement of the risk from Matthew Green’s blogpost on the topic: “The attack is borderline practical if you’re using the Datagram version of TLS (DTLS). It’s more on the theoretical side if you’re using standard TLS. However, with some clever engineering, that could change in the future. You should probably patch!“.
Lucky 13, s2n, and balancing trade-offs
Despite the significant difficulty of any practical attack, as described above, s2n has included two different forms of mitigation against the Lucky 13 attack since release: first by minimizing the timing difference mentioned above, and second, by masking any difference by including a delay of up to 10 seconds whenever any error is triggered.
Martin Albrecht and Kenny Paterson, again of the Information Security Group at Royal Holloway, University of London, got in touch to make us aware that our first mitigation could be made more effective on its own, and then after learning about the additional mitigation strategy also performed some impressive research on the effectiveness of the delay technique.
One obvious question was raised: why doesn’t s2n mitigate Lucky 13 in the same way that OpenSSL does? OpenSSL’s change introduced significant complexity, but s2n’s primary goal is to implement the TLS/SSL protocols in a secure way, which includes risk reduction via reduced complexity. Besides hard-to-attack timing issues, our observation is that other software coding errors can lead to much more practical forms of attacks including remote execution, such as the ShellShock vulnerability, or memory disclosure such as the HeartBleed vulnerability in OpenSSL. In short: these latter kinds of vulnerabilities, due to small coding errors, can be catastrophic and rank as higher impact threats.
We must carefully balance the risk of additional complexity and code against the benefit. This leads us to be extremely conservative in adding code to s2n and to prioritize readability, auditability and simplicity in our implementation, which we emphasize in s2n’s development guide. Simply put, we keep things simple for better security.
Some modern cryptographic implementations, most notably NaCl, strive to make all operations take the same amount of time to execute regardless of the input, using techniques such as branch-free programming and bitwise operations. Where appropriate s2n employs these techniques, too.
For example, s2n verifies CBC padding in constant time regardless of the amount of padding, an important aspect of mitigating Lucky 13 on its own. Wherever s2n uses these techniques, we include explanatory code comments, in addition to analyzing and auditing the machine code emitted by multiple common compilers.
Unfortunately, the design of the CBC algorithm that was impacted by Lucky 13 pre-dates these techniques, and while it is feasible to do constant-time padding verification, it is not possible to apply the technique to HMAC verification without changing the interface to the HMAC algorithm in quite a complex way, and the interaction between CBC and HMAC verification is where the timing issue shows up.
As the research from Albrecht and Paterson found, retrofitting constant-time practices into OpenSSL’s CBC handling required hundreds of lines of code; inter-weaving the TLS handling code and the HMAC code, and changing the interface to HMAC in a non-standard way. Although this was done with great care and in a manner appropriate to OpenSSL, we do not believe a similar approach is suitable for s2n, where it would lead to an almost 10% increase in code and, in our view, would have a significant impact on clarity, for a very marginal benefit. Additionally, we are also formally verifying our implementation of the HMAC algorithm against the published HMAC standard, so we try hard to use HMAC via its standard interface as closely as possible.
In other words: changing the HMAC interface to be similar to OpenSSL would be adding hard-to-follow code and would defeat much of the benefit of formally verifying our implementation of the HMAC algorithm. With these trade-offs in mind, s2n first mitigated Lucky 13 by counting the number of bytes handled by HMAC and making this equal in all cases, a “close to constant time” technique that completely preserves the standard HMAC interface. This is an improvement over several alternative TLS implementations, which do not equalize the input to HMAC, and we could not observe any timing differences in our network testing.
Albrecht and Paterson pointed out to us that this technique does not account for some internal differences in how the HMAC algorithm behaves depending on the size of the data processed. We were aware of this limitation, but our validation experiments – attacking s2n the same way an attacker would, and with our secondary mitigation turned off – had not been able to measure a time difference using this approach.
Separate targeted measurements, using a synthetic benchmarking environment and direct instrumentation of the extracted code illustrated the timing difference better, and their latest paper includes some analysis of how this timing difference may be used in an attack.
Albrecht and Paterson were also able to help us avoid making complex changes by suggesting a small and clever change to the HMAC interface: add a single new call that always performs two internal rounds of compression, even if one may not be necessary. With this change in place, the timing differences would be unobservable even in the synthetic environment. Considering the low risk and the experimental evidence of our previous testing we made this change in s2n as part of a commit on July 13th, just less than two weeks after s2n’s public launch on GitHub.
Underscoring the principle that “with more code comes more risk,” this change had a small bug itself; instead of expecting it to take 9 bytes of space to finalize a checksum, the original change specified 8. Manuel Barbosa, José Almeida (HASLab – INESC TEC and Univ. Minho), Gilles Barthe, and François Dupressoir (IMDEA Software Institute) have put together a comprehensive paper detailing the issue and the value of more testing in this area and how properties such as constant time of execution can be verified more systematically.
Perhaps most interesting about this is that the code author, reviewers, and researchers involved are all extremely familiar with the internals of HMAC and hash algorithms, and this small detail still escaped review (though no formal code audit had yet begun on the changes). The same authors are also working on a formal verification tool specific for constant time implementations in collaboration with Michael Emmi (IMDEA Software Institute). We are working with vendors to apply these techniques more generally.
s2n’s timing blinding mitigation
Lucky 13 is not the only attack against the TLS protocol to use timing differences, and so, as part of developing s2n, we decided to include a general mitigation against timing attacks. Whenever a TLS error is encountered, including Lucky 13 attack attempts, s2n pauses a duration of time between 1ms and 10 seconds. The goal here is to produce a general high-level mitigation which would have rendered any known timing attack impractical, even if the underlying issue were unpatched. Or, as is the case here, if another intended mitigation is not fully effective.
The effect is to hide a signal (the timing difference) in a very large sea of noise. This doesn’t make it impossible to find the original signal, but it takes more measurements. The math involved is straightforward and is included in the original Lucky 13 paper. If the timing issue is 1 microsecond, then a delay of between 1 millisecond and 10 seconds increases the number of attempts required by about 8.3 trillion; far outside the realm of the practical. However this is only the case if the delay is uniform.
Initially our random delay duration was implemented using the usleep routine (or by the calling process if it is handling multiple connections), which is granular to microseconds. One of our concerns with the usleep approach is that it may be too coarse: a timing issue smaller than a microsecond may still “slip through” if the implementation of usleep is consistent enough.
This issue is not fatal; merely by adding 5 seconds of delay on average to each attempt the Lucky 13 attack is slowed down from a roughly 64-hour per byte attack to an attack which would take at least a year of attack time per byte to succeed (again, assuming idealized circumstances for the attacker).
Based on the renewed concerns, and with the encouragement of Albrecht and Paterson, we changed s2n to use nanosleep() for our delaying mechanism on July 11th. Indeed this change proved beneficial as further research by Albrecht and Paterson over the coming months showed that observable timing issues could indeed slip through the old microsecond granular sleep mechanism and that the plaintext could be recovered in an idealized synthetic environment by making tens of millions attempts per byte, which is less than the intended trillions.
So for an over-simplified assessment of how impractical an attack of this nature would be, assuming an attacker was in the right place, at the right time, and could convince their victim to send their password to a server, over and over again as fast as the network would allow, it would take in the region of 20 years to recover an 8-character password via the Lucky 13 attack vs s2n. The real-world wall-clock time required may be shortened here by trying attempts in parallel, but the contention of network queues and cumulative delay of packets also interfere with measurements, and it is also an even harder attack to mount against clients in the first place.
The change to nanoseconds means that this is now prevented. We are also examining other changes, including a mode in which s2n emulates hardware by handling errors only at set “tick” intervals, as well as a change to enforce a higher minimum timing delay in all cases.
As mentioned earlier, some implementations of TLS do not include mitigations against Lucky 13. In some cases this is because they are implemented in higher level languages, where constant time coding techniques are not possible.
These languages are generally safer than low-level ones from the kinds of remote-execution and memory disclosure risk presented by coding errors. As technologists, it would be extremely unfortunate if we were forced to choose between safer programming languages and safer cryptography. Timing blinding may be a useful way to bridge this gap, and we look forward to further research in this area.