In this very basic cryptography exercise, I have written a simple test of the quality of a cipher. For RC4 and stream ciphers, we can encrypt
\x00\x00\x00\x00 to get the first four bytes of the keystream. I do this for the first 1048576 keys (assuming big endian and 64-bit keys) with RC4. Then I find out how many random keys I have to try before I find the same first four keystream bytes. I do this 1024 times. The data shows that this is around 4 million keys.
For block ciphers like AES, we have to do it slightly differently, but the concept is the exact same. I encrypt "GET / HTTP/1.1\r\n" which happens to be 16 bytes, the exactly correct size to fit in a single block of AES plaintext. I store the first four bytes of the ciphertext for the first 1048576 keys (same assumption as above but 128-bit keys). Then I do the same with random keys and I compare the first four bytes of the ciphertext against the first four bytes of the 1048576 partial ciphertexts. I find out how many random keys I have to try before I find the same first four ciphertext bytes. I do this 1024 times. The data shows that this is around 3 million keys. As you can clearly see, this is far smaller than RC4 (which is known to be vulnerable to many attacks).Update
To test whether the problem is in AES or RC4, I used my system's random number generator (Linux /dev/urandom) to generate random bytes of keystream and tested how many attempts it would take to collide 1024 times. It took on the order of 4 million. This proves that the issue is either in AES or in RC4 and my system's random number generator. Since my system's random number generator is as good a source of entropy as I have, I must conclude that there is no issue with RC4 and that there is an issue with AES.
To test whether this was a problem in my code, AES, or all block ciphers I tested the same code with Blowfish instead of AES. Blowfish returned the same result as AES, thus confirming that the problem is either in my code or in multiple block ciphers. This makes it more unlikely that AES is vulnerable to an attack and more likely that I have stumbled upon a feature of block ciphers. In order to test this theory I would like to run this test on other ciphers and other modes of ciphers.
This appears to be a weakness in AES. My lack of knowledge of cryptography however is making this educated guess poorly backed by science. I hope that a cryptographer can look at the source code and the methodology and tell me why I am seeing this result.
AES: 1024 keys found in 3289822 attempts 1024 keys found in 2919389 attempts 1024 keys found in 3160136 attempts 1024 keys found in 3169387 attempts RC4: 1024 keys found in 4081981 attempts 1024 keys found in 4038271 attempts 1024 keys found in 4058535 attempts 1024 keys found in 4121532 attempts RC4 (with the first 1024 bytes discarded): 1024 keys found in 4102351 attempts 1024 keys found in 4120705 attempts RNG: 1024 keys found in 4070825 attempts 1024 keys found in 4191500 attempts 1024 keys found in 3889336 attempts Blowfish: 1024 keys found in 3111644 attempts 1024 keys found in 3219266 attempts 1024 keys found in 3369217 attempts
AES: len(tons_of_ciphertexts_set): 1048451 1.000 len(tons_of_ciphertexts_set): 1048451 1.000 len(tons_of_ciphertexts_set): 1048451 1.000 len(tons_of_ciphertexts_set): 1048451 1.000 RC4: len(tons_of_keystreams_set): 1047777 0.999 len(tons_of_keystreams_set): 1047777 0.999 len(tons_of_keystreams_set): 1047777 0.999 len(tons_of_keystreams_set): 1047777 0.999 RC4 (with the first 1024 bytes discarded): len(tons_of_keystreams_set): 1048448 1.000 len(tons_of_keystreams_set): 1048448 1.000 RNG: len(tons_of_keystreams_set): 1048445 1.000 len(tons_of_keystreams_set): 1048454 1.000 len(tons_of_keystreams_set): 1048453 1.000 Blowfish: len(tons_of_ciphertexts_set): 1048439 1.000 len(tons_of_ciphertexts_set): 1048439 1.000 len(tons_of_ciphertexts_set): 1048439 1.000
Cryptographers on Twitter have come to my aid on this project. Frank Denis @jedisct1 correctly identified the problem as the Birthday Problem. This does not solve the issue I am seeing here. Continuing on, more insightful comments come from Marsh Ray (a respected security and cryptography researcher I know here in the area) who says:
I suspect it's actually a weakness of RC4 you're seeing. It tends to have pairs of same values
But your plaintext has no repeating values, so when XORd will have a smaller than chance probability of being \x00\x00\x00\x00
This is most likely the correct answer. RC4 has properties that a perfect pseudorandom number generator (PRNG) does not have. The question now becomes why exactly does
AES("GET / HTTP/1.1\r\n", random_key)[:4] collide sooner than a bad PRNG? The expectation is that AES is a nearly perfect PRNG. If this is not true, then AES-CTR is at least partially broken.
Taylor Hornby (@DefuseSec) correctly identified a bug in my code that did not affect the functioning of the code (it was equivalent to a nop). I have fixed that line of code and released a new version. The old version is also available to those who wish to compare.
Andy Isaacson (@eqe) spent several posts trying to explain concisely the problem:
with AES, you've computed 2^20 random 32 bit values. Check how many different values you got - should be about 1048449.
now, you're computing more random 32-bit values and checking if they're in the initial list. Each random has a 1 in 2^12 chance.
this is straight probability -- how long until you hit 50% probability of collision? It's simply pow(1-(1.0/(1<<12)), 2900)=0.49 !
(well not quite 1 in 2^12, but rather a 1048449 in 2^32 chance; 0.0244111% versus 0.0244140%. Close enough.)
I suspect your RC4 initial set is smaller since RC4 is biased, decreasing the probability of your random sequence colliding.
but I don't have an RC4 implementation in this crypto library, so you'll have to check for yourself. :)
This is correct, the RC4 initial set is substantially smaller than the AES initial set because RC4 collides ciphertexts in the getKeyStreams function while AES does not collide Ciphertexts in the getCiphertexts function. Excellent detective work without a functioning implementation of RC4 Andy.
Andy's answer doesn't actually solve the discrepancy between AES and RC4. When RC4 is changed so that doesn't collide 674 times in getKeyStreams, RC4 still takes 4102351 attempts to collide 1024 times. AES still takes ~3 million attempts.Update
The recent addition of keystream_dupe_rng1.py proves to my satisfaction that the issue is not with RC4 but with AES. The recent additions of keystream_dupe_rng1.py and keystream_dupe_blowfish1.py proves to my satisfaction that the issue is not with RC4. It appears to be a feature of block ciphers. Whether this is considered a weakness or feature is up to cryptographers to decide.
Analysis will continue until an answer is found. Currently no explanation for the phenomena exists.
Andy Isaacson (@eqe) successfully determined that when conducting a birthday attack, the number of items being collided against plays a very big role on how many collisions occur per N attempts. The data that I failed to put into the original is now in the data section.
This short foray into cryptography shows no weakness in AES. Instead it shows an easily discovered weakness in RC4. By generating 1048451 4-byte keystreams, 674 collisions occur. Modern implementations of RC4 throw away the first 1024 bytes of the keystream, so this is an unfair assessment, so version 0.3 will test this modified version of RC4.
AES seems to be the culprit in this test. Block ciphers seem to be the culprit in this test. Will we eventually find out whether AES has a weakness or feature that causes its ciphertexts to collide? My guess is that this will become uncovered once a cryptographer spends a few hours looking at this problem.
This mystery continues unsolved.