Hash function primitives

Introduction

Hash functions map an essentially undetermined amount of data onto a finite-length digest. Hash functions are deterministic, publicly known mappings of any amount of data onto a data word of given, finite, length. One can see hash functions as surjections of all the natural numbers onto the set of natural numbers smaller than N. As such, many different data sets will map onto the same number of course: that's unavoidable. Also, it is desirable that all the numbers smaller than N can be reached for some data set. This is why we say that a hash function is a surjection from the natural numbers into the set {0, 1, 2, ... N-1}. A trivial such hash function is f(x) = x mod N.

A cryptographic hash function is a hash function that has the following properties:

pre-image resistance
second pre-image resistance
collision resistance

The first property, pre-image resistance, means that, given a hash, it is computationally infeasible to invent a message that has the given hash (even though an infinite number of such messages exist). The second property means that given a message and its hash, it is computationally infeasible to invent another message that has the given hash. The difference between 1 and 2 is that in 2, one has access to a message that has produced the hash, while in 1, one doesn't have such a message. In other words, 2, is a stronger requirement than 1., because if one can do 1, then of course one can do 2, but if one can do 2, that doesn't mean that one can do 1. In all these cases, the hash is a given.

The third property means that it is computationally infeasible to invent 2 different messages such that they have the same hash. The difference with 1 and 2 is that here, one is free to invent the first message, and hence, the hash is not a given. 3. is the most severe requirement for a hash function.

In fact, if 3 holds, then 1 and 2 hold. If 3 doesn't hold, it may be that 2 and 1 are still valid. And it is possible that 3 and 2 don't hold, but that 1 holds. Depending on the application, a hash function needs to satisfy 1, 2 or 3.

In this section, we are not going to discuss the hash function proper, but we are going to discuss the core element that allows one to build hash functions: the cryptographic compression function. One can compare the cryptographic compression function to the block cypher; it is a core element which one can use to build symmetric cypher algorithms, but many other things too. One needs to distinguish between the algorithm of the core element, and the algorithm of the cryptographic application (such as a hash function or a symmetric cypher) that uses the core element.

A block cypher and a cryptographic compression function are very similar, and in fact, for some cryptographic applications (such as hash functions !) one can use both. But there are also fundamental differences. Let us remember what a block cypher did:

It was a function that took a block of n bits (the clear text), and a key of k bits, and it produced a block of n bits (the cypher text): X = f(C,K)
There was an inverse function that could produce the clear text from the cypher text and the key: C = g(X,K)
It was computationally not feasible to find the key from a pair of cypher text and clear text K = ?(C,X)

In contrast, a cryptographic compression function is the following:

It is a function that takes a block of n bits and produces a block of m < n bits: H = c(B)
It is computationally not feasible to find a block B', if given a block H, such that c(B') = H (pre-image resistance)
It is computationally not feasible to find a block B', if given a block B such that c(B) = c(B') = H (second pre-image resistance)
It is computationally not feasible to find two blocks, B and B', that differ by at least one bit, such that one has c(B) = c(B') (collision resistance)

The way to make a compression function c out of a block cypher f goes as follows:

Consider B the juxtaposition of a piece of n bits (C) and a piece of k bits (K), so B = {C,K}. Then any block cypher X = f(C,K) is also a compression function X = c(B).

Indeed, it is normally impossible to find (any) C' such that X = f(C',K') if given C, K and X (second pre-image resistance) because if it were, it would imply that the key and the clear text are not sufficiently irreversibly mixed. But note that this is not a fundamental requirement of the block cypher ; it is only a method (the irreversible mixing) to avoid key recovery, and not a requirement in itself. Also, the fact that a block cypher has an inverse function (to decrypt) is a potential weakness that a compression function doesn't need to have.

So although it is perfectly possible to use a block cypher as a compression function, and even though it is often used that way, it is maybe not the most secure thing to do. This is why there are compression function algorithms that are exactly that, as primitive building block. This section is about those compression functions.

In fact, there is no real need for a genuine compression function to make a hash function, although it is the classical way of doing so. A "compression function" with just as much bits output as input can also be used to construct a hash function, with a technique called a "sponge construction". We will illustrate this with keccak. But in fact, it shouldn't surprise us that we don't need a genuine compression function, because any 1-1 "mixing" function can serve as a compression function, by only retaining part of the output. If we have a 1-1 function that transforms a block of, say, 512 bits into another block of 512 bits, then we can limit ourselves to, say, the first 160 bits of the output block, and call this a compression function, after all. In as much as statistically, every input bit has just as much influence on every output bit, there's nothing special about the first 160 bits of the output. Such an approach has the advantage over a block-cypher-like approach, that all input bits are on equal footing, which is not the case for a block cypher. In a block cypher, the key bits are different from the data bits (on purpose), and in a block cypher like approach of a compression function, this difference remains. In a 1 to 1 "compression" function, this is not necessary, which is cryptographically superior.

SHA-1

As mentioned before, we will discuss the compression function of SHA-1. However, SHA-1 is the definition of a hash function, so in the specifications of SHA-1 is also mentioned the way the compression function should be used in order to build a hash function. The SHA-1 specification includes the use of the Davis-Meyer variant of the Merkle-Damgard construction from a compression function with different inputs into a general hash function, but we will only look at the compression engine. The compression engine has a "data block input" of 512 bits, and a "hash input" of 160 bits, and puts out a hash of 160 bits. So the function looks like:

H = f(B = {C,G} ) where H and G are 160 bit, and C is 512 bit.

In other words, f compresses a 512 + 160 = 672 bit block into a 160 bit block, but not all input bits are "equal": they are split in a 512 bit block and a 160 bit block. This looks a lot like a block cypher, where the "key" is 512 bits, and the data block is 160 bits. In fact, the SHA-1 compression engine looks too much like a block cypher, and has shown weaknesses for that reason.

The message block has the function of "key" and there is a message schedule, just as there is a key schedule in a block cypher. From the 512 bit block, one derives 80 32-bit words (so in total 1600 bits) as follows:

The first 16 words W₀ to W₁₅ are just taken from the message itself
The remaining 64 words are derived with the following iteration formula: W_j = shift left one bit of XOR of (W_j-16 , W_j-14, W_j-8 , W_j-3 )

The hash transformation of a block of 160 bits goes as follows:

The 160 bit word is split in 5 words of 32 bit: A, B, C, D and E
20 rounds are applied in stage 1, using W₀,... W₁₉, K₁ and f₁
20 rounds are applied in stage 2, using W₂₀ ... W₃₉, K₂ and f₂
20 rounds are applied in stage 3, using W₄₀ ... W₅₉, K₃ and f₃
20 rounds are applied in stage 4, using W₆₀ ... W_79,K₄ and f₄
to the resulting A', B',C',D',E', one adds respectively the A, B, C, D, E of step 1, modulo 2³²

This gives the output block of 160 bits.

Each stage round is similar, except for the different entries as specified above. The rounds have a Feistel-like structure, in the sense that only a part of the word is transformed per round. Different operations are specified on each of the 5 32-bit words:

A becomes E + f_i(B,C,D) + shift-left-5-bits(A) + W_j + K_i(addition modulo 2^₃₂₎
B becomes A
C becomes shift-left-30-bits of B
D becomes C
E becomes D

In the first stage, K₁ = 0x5A827999, in the second stage, K₂ = 0x6ED9EBA1, in the third stage K₃= 0x8F1BBCDC and in the fourth stage K₄ = 0xCA62C1D6.

In the first stage, f₁(B,C,D) is (B and C) or (B and D) ; in the second stage, f₂(B,C,D) is the XOR of B, C and D ; in the third stage f₃(B,C,D) is (B and C) or (B and D) or (C and D) and in the fourth stage f₄(B,C,D) is the XOR of B, C and D again. The operations are to be understood bit-wise.

The f-functions are the non-linear parts of the compression function.

Although no practical attacks on SHA-1 have been publicly demonstrated, there are enough theoretical attacks known for the security of SHA-1 on the long term to be doubted.

SHA-2 family

In order to remedy against the theoretical weaknesses perceived of SHA-1, a new family of hash functions was proposed. It is a family of functions, because several data block lengths, and hash lengths are proposed, but the actual functions are very similar in their algorithmic description. It is somewhat surprising that the overall structure of SHA-2 is in fact very similar to the structure of SHA-1, and has also a block-cypher-like approach with a Feistel-like round succession. Nevertheless, the amount of mixing of the bits is much larger in SHA-2, and the theoretical attacks on SHA-1 have difficulties being extended to SHA-2.

Again, SHA-2 is a family of hashing functions, namely SHA-224, SHA-256, SHA-384 and SHA-512, while in this contribution we are only interested in the compression function. We will take SHA-256 as example.

The SHA-256 compression function takes a data block of 512 bit, and an input word (input hash) of 256 bits, and produces a new 256-bit output word (output hash). The data block is similar to the key in a block cypher, and the data block will undergo a "message schedule". The core of the compression function is 32-bit word oriented: operations happen on 32-bit words which are part of bigger blocks.

The compression function goes as follows:

Take the input 256 input word (the "input hash") and put its 8 32-bit pieces into 8 32-bit variables A, B, ... H.
Apply 64 rounds (index j = 0 to 63) of modifications to these 8 32-bit variables, with a "data key" W_j and a round constant K_j
Put the 8 words together into the output word of 256 bit

Each of the 64 rounds goes like this, and is very Feistel-like:

H becomes G
G becomes F
F becomes E
E becomes D + H + t(E) + c(E,F,G) + K_j + W_j
D becomes C
C becomes B
B becomes A
A becomes H + t(E) + c(E,F,G) + K_j + W_j + s(A) + m(A,B,C)

The + are 32-bit addition modulo 2³², and t(), c(), s() and m() are 32-bit functions of 32 bit words.

The functions are defined as follows:

c(X,Y,Z) = (X and Y) XOR (NOT X and Z)

m(X,Y,Z) = (X and Y) XOR (X and Z) XOR (Y and Z)

s(X) = rotate-right-2-bit(X) XOR rotate-right-13-bit(X) XOR rotate-right-22-bit(X)

t(X) = rotate-right-6-bit(X) XOR rotate-right-11-bit(X) XOR rotate-right-25-bit(X)

We now have to define the "data schedule". Out of the data block of 512 bits, one has to calculate 64 32-bit words K_j.

This goes as follows:

The first 16 words (for i = 0 to 15) are simply the 32-bit words of the data block
The next 48 words (for i = 16 to 63) are given by an iteration formula.

That formula is:

W_j = u(W_j-2) + W_j-7 + v(W_j-15) + W_j-16

with u and v two functions:

u(X) = rotate-right-17-bit(X) XOR rotate-right-19-bit(X) XOR shift-right-10-bit(X)

v(X) = rotate-right-7-bit(X) XOR rotate-right-18-bit(X) XOR shift-right-3-bit(X)

Finally, the 64 constants K_j are given in hexadecimal notation:

428a2f98 71374491 b5c0fbcf e9b5dba5 3956c25b 59f111f1 923f82a4 ab1c5ed5 
d807aa98 12835b01 243185be 550c7dc3 72be5d74 80deb1fe 9bdc06a7 c19bf174 
e49b69c1 efbe4786 0fc19dc6 240ca1cc 2de92c6f 4a7484aa 5cb0a9dc 76f988da 
983e5152 a831c66d b00327c8 bf597fc7 c6e00bf3 d5a79147 06ca6351 14292967 
27b70a85 2e1b2138 4d2c6dfc 53380d13 650a7354 766a0abb 81c2c92e 92722c85 
a2bfe8a1 a81a664b c24b8b70 c76c51a3 d192e819 d6990624 f40e3585 106aa070 
19a4c116 1e376c08 2748774c 34b0bcb5 391c0cb3 4ed8aa4a 5b9cca4f 682e6ff3 
748f82ee 78a5636f 84c87814 8cc70208 90befffa a4506ceb bef9a3f7 c67178f2

They are listed from K₀ to K₆₃. These constants are the fractional parts of the third roots of the first 64 prime numbers.

Note that SHA-2 is not very different in principle from SHA-1. There are more "arbitrary" constants, and the substitution functions are more involved, but the block cypher-like structure remains: on one hand a schedule with data blocks, and on the other hand an "encryption" of the hash block.

The SHA-512 compression function is not very different. Essentially, one obtains the SHA-512 compression function if one modifies the following points in the SHA-256 compression function we just discussed:

The data block is now 1024 bits wide, and the words A...H are each 64 bit wide
There are 80 rounds instead of 64
The message schedule will produce 80 64-bit words Wj instead of 64 32-bit words
The round constants are now 64-bit versions of the fractional part of cube roots of the 80 first primes
The shift and round number of bits are different in the functions s, t, u and v

s(X) = rotate-right-28-bit(X) XOR rotate-right-34-bit(X) XOR rotate-right-39-bit(X)

t(X) = rotate-right-14-bit(X) XOR rotate-right-18-bit(X) XOR rotate-right-41-bit(X)

u(X) = rotate-right-19-bit(X) XOR rotate-right-61-bit(X) XOR shift-right-6-bit(X)

v(X) = rotate-right-1-bit(X) XOR rotate-right-8-bit(X) XOR shift-right-7-bit(X)

We recall that the actual definitions of the SHA-2 family contain more than just the description of the compression function, but in this section, we limit ourselves to those compression functions.

SHA-3 - Keccak

SHA-1 and SHA-2 have been designed by the NSA of the USA. Even though none of them have been cracked publicly in a practical setup, theoretical weaknesses have been demonstrated in SHA-1, and even though SHA-2 is not exposed to exactly the same attacks, the similarity in their design called for another hashing standard, that could be used as a drop-in if ever SHA-2 would prove publicly vulnerable. Another issue is of course the fact that the design was by the NSA, who has as complementary task to crack cryptography, and hence any cryptographic standard issued by such an intelligence agency is born with the (probably unfounded) suspicion that there's a kind of back door in it, especially because the NSA has already issued standards with back doors in it. As such, SHA-3 has been, like AES, an open competition. It is remarkable that the winner of AES (namely Rijndael) and the winner of SHA-3, namely Keccak, have a common designer, a Belgian cryptographer, Joan Daemen.

In this section we will discuss the core "compression" function of Keccak, and not the entire protocol. However, Keccak is of a totally different type of hash function than SHA-1 and SHA-2, in that there is absolutely no resemblance to a block cypher, and that all bits are treated equally. Also, the compression factor of the Keccak core function is 1 - 1, because it is part of a sponge construction, and not of a Merkle-Damgard construction which invites a block cypher like setup (although this is not necessary).

The "compression function" of Keccak is simply an "irreversible bit mixer". The block of which the bits are mixed take on the form of a 5 x 5 matrix of words of 2^l bits, and for the SHA-3 version of Keccak, l = 6, so we have a block of 5 x 5 x 64 bits = 1600 bits. However, the Keccak definition is broader, and can take on blocks with just any l. We will call the state "a".

The bit mixer function has 12 + 2 l rounds. For l = 6, this amounts to 24 rounds in the SHA-3 version.

Each round consists of 5 sub-transformations which are given the names of Greek letters: theta, rho, pi, Xi and iota.

The bits are numbered i: 0-4, j: 0-4 and k: 0 - 2^l-1, where i is the row index, j is the column index and k is the bit index. When the 64-bit words are interpreted as unsigned integers, the little-endian convention applies.

Theta-step:

The 5 2^l(= 320 if l = 6) parity bits are calculated over the rows: p(j,k) = parity bit of the 5 bits a [ 0..4, j, k ]. The theta step consists of making the new a[ i, j, k] := a [ i, j, k ] XOR p(j - 1, k) XOR p(j + 1, k - 1) where we take j - 1, j + 1 and k - 1modulo 5.

rho-step:

Each of the 25 words is rotated over a triangular number of bits: a[ i, j, k ] := a [ i, j, (k - (t+1)(t+2)/2 ]

t is the solution of a matrix equation:

$\text{[math]}$

When t runs from 0 to 24, all index pairs (modulo 5) are generated.

pi-step:

Permuting the 25 words themselves under the relationship: a [ j, 2i + 3j , k ] := a[ i, j, k]

Xi-step:

This is the non-linear operation of Keccak:

a[ i, j, k ] := a[ i, j, k ] XOR (NOT a[ i, j+1, k ] AND a[ i, j+2, k ] )

iota-step:

A few bits of the first word a [ 0 , 0 , k ] are changed: in round n, the l+1 bits with m = 0 ... l: a[ 0, 0, 2^m-1 ] are changed. In round n, are XOR-ed with the bit m + 7 n coming out of a specific linear feedback shift register of degree 8.

Essentially, this comes down to XORing the word a [ 0, 0, k] with the following round constants of 64 bit:

RC[ 0]	0x0000000000000001 	RC[12]	0x000000008000808B
RC[ 1]	0x0000000000008082 	RC[13]	0x800000000000008B
RC[ 2]	0x800000000000808A 	RC[14]	0x8000000000008089
RC[ 3]	0x8000000080008000 	RC[15]	0x8000000000008003
RC[ 4]	0x000000000000808B 	RC[16]	0x8000000000008002
RC[ 5]	0x0000000080000001 	RC[17]	0x8000000000000080
RC[ 6]	0x8000000080008081 	RC[18]	0x000000000000800A
RC[ 7]	0x8000000000008009 	RC[19]	0x800000008000000A
RC[ 8]	0x000000000000008A 	RC[20]	0x8000000080008081
RC[ 9]	0x0000000000000088 	RC[21]	0x8000000000008080
RC[10]	0x0000000080008009 	RC[22]	0x0000000080000001
RC[11]	0x000000008000000A 	RC[23]	0x8000000080008008

This terminates the description of the Keccak "bitmixer" function. It turns out that this function is, in fact, reversible ! In the sponge construction where it will be used, the irreversibility will come from the fact that parts of the internal state will be discarded, but one should keep in mind that the function itself has no pre-image resistance at all.

Conclusion

We studied three different, well-known hash function primitives, or compression functions. SHA-1 and SHA-2 are very similar, and look like block cyphers, with two kinds of input: the "hash" input which will be "the part of the input that has the same length as the output", and the "data" input, which looks like the key input of a block cypher, and corresponds to the bits that will not be in the output (hence compressing the input which is data plus hash, into the size of a hash). SHA-3 is of a totally different nature, and is a 1-1 compression function, where all input bits are treated the same way.

Hash function primitives are used to construct hash functions (and eventually, other cryptographic basis functions as we will see). We distinguished somewhat artificially the hash function primitives from the construction ; in most specifications, both are considered together. We did this because of our concept of layered cryptography, where we see this in a similar way as networking: a stack of protocols: http on top of tcp ; tcp on top of ip ; ip on top of, for instance, ethernet.

There are many more hash function primitives out there than the 3 we've discussed of course.