Entropy of hash functions and block cyphers

Introduction

We've argued that entropy is an essential concept in cryptography, as cryptography is about ignorance by the enemy. So how is entropy treated by the two primitives we've been discussing: block cyphers and hash function primitives ? In what follows, we won't consider particular, practical hash function primitives or block cyphers, but their "ideal" versions. In fact, we could turn it around: the entropy flows we will deduce actually define the ideal versions of block cyphers and hash function primitives !

Entropy flow in hash function primitives

Let us consider a 1-1 compression function such as the primitive in Keccak, but let us, for ease of reasoning, consider that it is much smaller: that it maps 16 bit words on 16 bit words. Ideally, the function is a permutation "which looks random" but of course, for which we have an algorithmic specification (which makes that it is not actually, "random" in the sense of drawn from a uniform distribution over the set of permutations over the set {0, 1, 2, ... 2¹⁶ - 1}, in the same way that any algorithm of finite length cannot result in a random number over the real numbers but only over a countable subset of it).

As it is a permutation, which is by definition, mathematically reversible, the answer to the question "what is the entropy of the output ?" is easy: the entropy of the output is exactly the same as the entropy of the input ! (under the assumption of a genuine permutation)

For every input state, there is ideally, exactly one output state, so the probability for that output state to appear is exactly the same as that for the input state, and hence the entropy of any distribution over the input is exactly the entropy over the output states. In our example, if the input has an entropy of 16 bits (in other words, the input is totally random), then the output will also have an entropy of 16 bits (they will be totally random).

Note that if the input has an entropy of 2 bits (which comes down to saying that essentially, there are 4 possible inputs, even if they are coded on 16 bits), then the output will also have 2 bits of entropy (there will be only 4 possible output states, even though they "look random").

But consider now that we turn our 1-1 compression function into a genuine compression function, by dropping 12 of the 16 bits, and only retaining 4 bits in the output. How will the entropy of that 4-bit output be ? As the 4 bits are totally random when the 16 bit input is totally random, we see that the entropy will now be 4 bits. But that was easy. Consider now that the input has 4 bits entropy (that is, there are 16 possible input values). Will the output still have 4 bits of entropy ? Not exactly. The 4 bits of output have to be seen as randomly drawn. The function from the 16 input values to the output values is not necessarily a permutation any more. There is a certain probability that two input values map to the same output, and that not all 16 possible outputs are reached.

We can in fact easily estimate how many unoccupied output states there will be if the function is random, instead of being a random permutation. If there are N output states, and there are M input states, then, if the output states are uniformly randomly drawn for each input state, we have the probability that on a single drawing, a given output state is drawn, to be 1/N. As such, the probability not to be drawn, is (N - 1) / N. If the drawings are independent (that is, we don't take into account earlier drawings, as we have to in the case of a permutation, but not in the case of a function), then the probability to never be drawn when there are M inputs is [ (N - 1) / N ]^M . The average number of undrawn output states is hence: N [ (N - 1) / N ]^M .

We consider here the case where N = M, both equal to 16 (there are 16 possible output states with the 4 bits output, and 16 input states). We then have that the number of unoccupied states in the output will be N [ (N - 1) / N ]^N, which, for "large" N, tends to N / e .

For our 16 possibilities (N = M = 16), this turns out to be 16 (15/16)¹⁶ = 5.7. So grossly, we expect only 10 of the 16 states to be reached if we have only 16 inputs and we retain 4 bits in the output.

As such, there is an entropy-loss, equal to log₂ 1- 1/e = 0.7 bits if we have "full entropy" and a random function instead of a random permutation.

Note that from the moment that we have somewhat more input entropy than possible in the output, this becomes negligible. If we have a 5 bit entropy in the input, and 4 bits output, then the number of collisions is 16 [ 15 / 16 ]³² = 2. 14 of the 16 possible output states will be reached with just 1 more bit. This will cost us 0.2 bits of entropy. If we have 2 bits extra entropy in the input, it will become 16 [ 15 / 16 ]⁶⁴ = 0.26 and we can essentially consider that we have full output entropy.

Also, consider that we have more bits in the output. If we have 4 bits of entropy in the input, and 5 bits of output, then we expect 32 [ 31 / 32 ]¹⁶ - 16 = 3.25 states to have collisions, and if we have 6 bits of output, we expect 64 [ 63/64]¹⁶ - 48 = 1.7 states to have collisions. As such, we can neglect the loss in entropy from the moment that we have 2 bits more possible outputs than needed for the required entropy to be conserved.

It is only when input entropy matches the output capacity that we have a loss of about 0.7 bits of entropy when it is a random function, and not a random permutation.

In other words, to a very good approximation we have:

in as much as we have a random permutation, the output entropy is equal to the input entropy in a 1-1 function
in as much as we have a random function, and there is more entropy in the input than possible in the output, we have "full output entropy"
in as much as we have a random function, and there is less entropy in the input than possible in the output, we have "full input entropy".
in as much as we have a random function, and there is about the same amount of entropy in the input than possible in the output, we lose 0.7 bits of entropy

The Keccak 1-1 function is constructed to be reversible. As such, this function is a genuine permutation. However, from the moment that one selects a part of the output, in order to obtain a genuine compression function, this can of course not be a permutation any more, and hence one obtains at best a random-looking function.

One can argue whether the reversibility of the 1-1 function in order to obtain a genuine permutation (and hence avoid all collisions) is a sufficient advantage as compared to the inherent cryptographic weakness one introduces by making the function reversible.

Genuine compression functions behave as random functions (in the ideal case). As such, our conclusions apply to them too.

An important aspect of compression functions is that they can "concentrate entropy" ; in other words, that they can "whiten" non-perfect entropy sources, with the price of losing some entropy in the process.

Indeed, suppose that we have an entropy source which is far from being white, that is, the source generates a lot of correlated bits. Suppose that we have an entropy source which has a genuine entropy of 10 bits per word, but which issues 64 bit words. If we would use the 64-bit words as such, we would have highly correlated bits, which might not be acceptable for the application at hand. But if we pass these 64 bit words through a compression function, we may end up with our desired property, namely obtaining words which are "uniformly random". Suppose that we pass these 64-bit words through a compression function which puts out 6 bits. Then, according to our reasoning, these 6-bit outputs will contain essentially 6 bits of entropy. From our "diluted" 10 bits in 64, we obtained 6 bits of "concentrated" entropy (but we lost 4 bits of entropy in the process). If we would compress our 64 bits into 10 bits, we would, again according to our previous reasoning, obtain 9.3 bits of entropy. We would only have lost 0.7 bits of entropy, but our output words would still have some correlations left in them.

A typical application is the generation of high-entropy passwords from "pass phrases". Pass phrases are made up of character strings which make up words. If we code one character on 8 bits, then a pass phrase with 10 words of on the average 5 characters and a blanc will contain 60 characters, or 480 bits, but of course, the actual entropy of such a pass phrase is much, much lower. It is just that for humans, it is much easier to remember a 60 character pass phrase, than a smaller number of totally arbitrary characters. But one might need "white" bits from that pass phrase. Suppose that the words are picked from a dictionary with about 8000 words. That means that each word has in fact an entropy of 13 bits, but is coded on 48 bits. Our pass phrase will contain grossly 130 bits of entropy but is coded on 480 bits. With a hash function that maps 480 bits onto 128 bits, we are able to construct essentially a 128-bit word with 128 bits of entropy (a white word to say it differently). Grossly every bit in the 128 bit compressed string is random and uncorrelated to each other bit (which was absolutely not the case in the original 480 bit string). We lost 2 bits of entropy in the process, which is sufficient to "pay for" the 0.7 bits of statistical collisions.

Another application is the generation of white noise and of "truly random numbers".

Entropy flow in block cyphers

In a certain way, we can look upon a block cypher as a compression function, which compresses key plus clear text into cypher text. The reversibility of a block cypher makes that, for a given key, the relationship between clear text and cypher text is a permutation. As such, we can say that, for a given key, the entropy of the clear text and the cypher text is identical. Indeed, for a given key, a block cypher behaves as a 1-1 compression function which is a permutation.

But the interesting part of a block cypher is of course the entropy as seen from the side of an attacker, who doesn't know the key. As such, the entropy of the cypher text is grossly:

equal to the maximum possible entropy (= number of clear text bits) if the key entropy plus the clear text entropy is larger than the clear text size
equal to the key entropy plus the clear text entropy if that sum is smaller than the number of clear text bits

However, the above statement is only correct if one uses the key once !

If (as is of course it is the idea) there is key re-use, then the above statement still holds, except that we have to consider "clear text" as ALL clear text that is encrypted with the same key. We are then almost for sure in the second case.

Let us take an example, a block cypher with data blocks of 512 bit and a key of 256 bits.

Suppose that the key has a genuine entropy of 100 bits and suppose that we encrypt a data block with 450 genuine bits of entropy (very rare!). As 100 + 450 is larger than 512, the entropy of the cypher text is 512 bits, and is perfectly "white". However, in the same case where we have a data block which has only 50 bits of entropy (typical for an ASCII block of text), the cypher text will contain only 150 bits of entropy. This, in the case that the key is only used once.

Now, suppose that we encrypt 1000 blocks that way. The total entropy of the clear text is now 450 000 bits. The (single) key is 100 bits. So the sum is 450 100 bits, while the total *possible* amount of entropy in the output is 512 000 bits (1000 blocks of 512 bits of cypher text). So our encrypted stream will have an entropy of 450 100 bits. It is as if the key added only 0.1 bit of entropy to each data block ! However, one shouldn't see it like that. One should see it that the encryption has added the key entropy of 100 bits to the entire data stream, and that's exactly what happened: there are 2¹⁰⁰ different ways of encrypting this data stream, one for each possible key (given that the key had an entropy of 100 bits). The essential point is that the 100 bits of key entropy remain entangled with all of the cypher text.