AES Ciphertext Collisions Anomaly

by Javantea
April 8-10, 2015

Introduction

Permalink
keystream_dupe-0.6.tgz [sig]
keystream_dupe-0.5.tgz [sig]
keystream_dupe-0.4.tgz [sig]
keystream_dupe-0.3.tgz [sig]
keystream_dupe-0.2.tgz [sig]
keystream_dupe-0.1.tgz [sig]

In this very basic cryptography exercise, I have written a simple test of the quality of a cipher. For RC4 and stream ciphers, we can encrypt \x00\x00\x00\x00 to get the first four bytes of the keystream. I do this for the first 1048576 keys (assuming big endian and 64-bit keys) with RC4. Then I find out how many random keys I have to try before I find the same first four keystream bytes. I do this 1024 times. The data shows that this is around 4 million keys.

For block ciphers like AES, we have to do it slightly differently, but the concept is the exact same. I encrypt "GET / HTTP/1.1\r\n" which happens to be 16 bytes, the exactly correct size to fit in a single block of AES plaintext. I store the first four bytes of the ciphertext for the first 1048576 keys (same assumption as above but 128-bit keys). Then I do the same with random keys and I compare the first four bytes of the ciphertext against the first four bytes of the 1048576 partial ciphertexts. I find out how many random keys I have to try before I find the same first four ciphertext bytes. I do this 1024 times. The data shows that this is around 3 million keys. As you can clearly see, this is far smaller than RC4 (which is known to be vulnerable to many attacks).

Update

To test whether the problem is in AES or RC4, I used my system's random number generator (Linux /dev/urandom) to generate random bytes of keystream and tested how many attempts it would take to collide 1024 times. It took on the order of 4 million. This proves that the issue is either in AES or in RC4 and my system's random number generator. Since my system's random number generator is as good a source of entropy as I have, I must conclude that there is no issue with RC4 and that there is an issue with AES.

To test whether this was a problem in my code, AES, or all block ciphers I tested the same code with Blowfish instead of AES. Blowfish returned the same result as AES, thus confirming that the problem is either in my code or in multiple block ciphers. This makes it more unlikely that AES is vulnerable to an attack and more likely that I have stumbled upon a feature of block ciphers. In order to test this theory I would like to run this test on other ciphers and other modes of ciphers.

This appears to be a weakness in AES. My lack of knowledge of cryptography however is making this educated guess poorly backed by science. I hope that a cryptographer can look at the source code and the methodology and tell me why I am seeing this result.

Data

Attempts required to collide 1024 ciphertexts/keystreams

AES:
1024 keys found in 3289822 attempts
1024 keys found in 2919389 attempts
1024 keys found in 3160136 attempts
1024 keys found in 3169387 attempts

RC4:
1024 keys found in 4081981 attempts
1024 keys found in 4038271 attempts
1024 keys found in 4058535 attempts
1024 keys found in 4121532 attempts

RC4 (with the first 1024 bytes discarded):
1024 keys found in 4102351 attempts
1024 keys found in 4120705 attempts

RNG:
1024 keys found in 4070825 attempts
1024 keys found in 4191500 attempts
1024 keys found in 3889336 attempts

Blowfish:
1024 keys found in 3111644 attempts
1024 keys found in 3219266 attempts
1024 keys found in 3369217 attempts

Output of getKeyStreams(), getCiphertexts()

AES:
len(tons_of_ciphertexts_set): 1048451 1.000
len(tons_of_ciphertexts_set): 1048451 1.000
len(tons_of_ciphertexts_set): 1048451 1.000
len(tons_of_ciphertexts_set): 1048451 1.000

RC4:
len(tons_of_keystreams_set): 1047777 0.999
len(tons_of_keystreams_set): 1047777 0.999
len(tons_of_keystreams_set): 1047777 0.999
len(tons_of_keystreams_set): 1047777 0.999

RC4 (with the first 1024 bytes discarded):
len(tons_of_keystreams_set): 1048448 1.000
len(tons_of_keystreams_set): 1048448 1.000

RNG:
len(tons_of_keystreams_set): 1048445 1.000
len(tons_of_keystreams_set): 1048454 1.000
len(tons_of_keystreams_set): 1048453 1.000

Blowfish:
len(tons_of_ciphertexts_set): 1048439 1.000
len(tons_of_ciphertexts_set): 1048439 1.000
len(tons_of_ciphertexts_set): 1048439 1.000

Analysis

Cryptographers on Twitter have come to my aid on this project. Frank Denis ‏@jedisct1 correctly identified the problem as the Birthday Problem. This does not solve the issue I am seeing here. Continuing on, more insightful comments come from Marsh Ray (a respected security and cryptography researcher I know here in the area) who says:

I suspect it's actually a weakness of RC4 you're seeing. It tends to have pairs of same values

But your plaintext has no repeating values, so when XORd will have a smaller than chance probability of being \x00\x00\x00\x00

This is most likely the correct answer. RC4 has properties that a perfect pseudorandom number generator (PRNG) does not have. The question now becomes why exactly does AES("GET / HTTP/1.1\r\n", random_key)[:4] collide sooner than a bad PRNG? The expectation is that AES is a nearly perfect PRNG. If this is not true, then AES-CTR is at least partially broken.

Taylor Hornby (@DefuseSec) correctly identified a bug in my code that did not affect the functioning of the code (it was equivalent to a nop). I have fixed that line of code and released a new version. The old version is also available to those who wish to compare.

Andy Isaacson (@eqe) spent several posts trying to explain concisely the problem:

with AES, you've computed 2^20 random 32 bit values. Check how many different values you got - should be about 1048449.

now, you're computing more random 32-bit values and checking if they're in the initial list. Each random has a 1 in 2^12 chance.

this is straight probability -- how long until you hit 50% probability of collision? It's simply pow(1-(1.0/(1<<12)), 2900)=0.49 !

(well not quite 1 in 2^12, but rather a 1048449 in 2^32 chance; 0.0244111% versus 0.0244140%. Close enough.)

I suspect your RC4 initial set is smaller since RC4 is biased, decreasing the probability of your random sequence colliding.

but I don't have an RC4 implementation in this crypto library, so you'll have to check for yourself. :)

This is correct, the RC4 initial set is substantially smaller than the AES initial set because RC4 collides ciphertexts in the getKeyStreams function while AES does not collide Ciphertexts in the getCiphertexts function. Excellent detective work without a functioning implementation of RC4 Andy.

Andy's answer doesn't actually solve the discrepancy between AES and RC4. When RC4 is changed so that doesn't collide 674 times in getKeyStreams, RC4 still takes 4102351 attempts to collide 1024 times. AES still takes ~3 million attempts.

Update

The recent addition of keystream_dupe_rng1.py proves to my satisfaction that the issue is not with RC4 but with AES. The recent additions of keystream_dupe_rng1.py and keystream_dupe_blowfish1.py proves to my satisfaction that the issue is not with RC4. It appears to be a feature of block ciphers. Whether this is considered a weakness or feature is up to cryptographers to decide.

Analysis will continue until an answer is found. Currently no explanation for the phenomena exists.

Conclusion

Andy Isaacson (@eqe) successfully determined that when conducting a birthday attack, the number of items being collided against plays a very big role on how many collisions occur per N attempts. The data that I failed to put into the original is now in the data section.

This short foray into cryptography shows no weakness in AES. Instead it shows an easily discovered weakness in RC4. By generating 1048451 4-byte keystreams, 674 collisions occur. Modern implementations of RC4 throw away the first 1024 bytes of the keystream, so this is an unfair assessment, so version 0.3 will test this modified version of RC4.

Update

AES seems to be the culprit in this test. Block ciphers seem to be the culprit in this test. Will we eventually find out whether AES has a weakness or feature that causes its ciphertexts to collide? My guess is that this will become uncovered once a cryptographer spends a few hours looking at this problem.

This mystery continues unsolved.

Permalink

Comments: 0

Leave a reply »

 
  • Leave a Reply
    Your gravatar
    Your Name