Block cyphers

Introduction

The one-time pad is the only known provable secure cryptographic system, but on the condition that the key is only used once. The main problems with the one-time pad are the length of the key, and the fact that it can indeed only be used once. It can be that these difficulties are offset by the certainty of the security of the system, but it is not practical. The two difficulties come from the fact that the key and the message, in the one time pad, are "mixed" with a very simple mixing function: XOR. In fact, message and key are interchangeable. As such, when one knows the key, one knows the message (that's the idea) ; but when one knows the message, one also knows the key ! This comes from the fact that XOR is a reversible operation (its reverse is even XOR itself): the computational effort to find the key if one knows the message is not much larger (is in fact identical) than the computational effort to find the message if one knows the key (which is the "normal use").

In any case, one needs an algorithm that mixes the key and the clear text to produce the cypher text. If one wants to have a system with a reusable key, then one needs to find a way to mix the key and the clear text message in such a way, that it is easy to recover the clear text from the key and the cypher text, but that it is essentially computationally impossible to recover the key if one has the cypher text and the clear text. In other words, the treatment of the key and the clear text in the mixer should be totally a-symmetric, with a technique to reverse in one direction (the decryption algorithm), and a computational impossibility to reverse in the other direction.

Of course, there is always a technique to reverse in the other direction: try out all the possible keys: the brute force attack. One is looking for algorithms where the brute force technique is the only known technique of reversal.

As such, we come to our definition of a block cypher: it is a couple of algorithms that implement mathematical functions f and g, such that:

we have arguments K (key) of k bits, C (clear text) of n bits and X (cypher text) of n bits
X = f(K,C)
C = g(K,X)
X' = f(K',C) is about 50% correlated with X, for about every key K' different from K, and about every clear text C. This means that about every bit of the resulting cypher text is strongly dependent on about every bit of the key.
there is no known algorithm that can implement the function h: K = h(C,X) better than by brute force ; even more, there is no known algorithm that can implement h_m: K = h_m(C₁,X₁,C₂,X₂,....C_m,X_m) for any m, better than by brute force.

The last condition may need some explanation. It is very well possible that with the knowledge of one cypher text and its associated clear text, one cannot recover the key, but it might be possible that if one knows several such pairs (m in our notation), this becomes possible. As we want to re-use the key, this attack shouldn't be possible ideally. Most known algorithms start failing on this condition when m is very large, but as long as m itself is sufficiently large, the block cypher can be used: it is sufficient to change keys before m is reached.

Note that condition 4 is also essential. Indeed, the trivial cypher C = f(K,C) would satisfy all other conditions, as the key is not used, and can hence not be recovered from any pair of clear texts and cypher texts !

The block cypher will turn out to be an essential building block for many cryptographic applications, not only for symmetric secret messaging. Its essential function is to mix irreversibly a key and a data block, such that one can reverse this mixing only back to the data block (with the key), but not recover the key. This is the fundamental property which is necessary for key-reuse.

It is fundamental that a block cypher is a non-linear function f(K,C). Indeed, if it were a linear function (such as the XOR of the one-time pad), then the linear system of equations in 4 can easily be solved using techniques of linear algebra. But non-linear equations can also sometimes be solved, so it is not sufficient that the function f(K,C) is non-linear. In fact, there are no known guaranteed ways to build a function f(K,C) that satisfies our conditions ; there are no known "impossible problems". As such, a block cypher is always a kind of gamble that it will not be reversible. On the other hand, most block cyphers are also not linked to a specific mathematical hard problem (as is the case for public key cryptography). Most block cyphers are just complicated ways of mixing bits with one another such that the end result is a very complicated function of the input key and input block, but such that there is a way to retrace back from the output block to the input block using the key.

The necessity of the inverse function g(K,X) is a central weakness in block cyphers, and in fact, this reversibility is not necessary for many applications. But by definition, block cyphers are reversible in the C = g(K,X) sense. We will come to this.

Many block cypher algorithms are structured grossly alike: they have a basic algorithm block that mixes an input data block with a key (just like the overall block cypher is supposed to do). This block is repeated (sometimes with variations) in N different "rounds" so that the data output of a round is the data input of the next round, and the keys for each round are derived from the main key. This derivation of N keys from the main key is called the "key schedule". It is important that the key schedule is not depending on the data, so that on the decryption side, the same key schedule can be derived (through knowledge of the main key). If each round can be reversed, the whole system can be reversed, but the relationship between the original clear text and the final cypher text misses all intermediate states to attack a single round.

A round has two functions: confusion, and diffusion. Confusion is a non-linear operation that tries to make it impossible to separate the key from the data (tries to implement requirement 5). Diffusion swaps and mixes bits so that the "influence" of a single bit (in the clear text, or in the key) propagates more or less evenly to all bits of the output. Diffusion can be linear and tries to implement requirement 4.

Next to theoretical considerations, it is often important that the algorithm implementations of f and of g are efficient. Block cyphers are often used to treat large volumes of data, or high fluxes of data. An example is disk encryption. One can judge its efficiency for hardware implementations, and one can judge them for software implementations. So some block cyphers are optimised for efficient hardware implementation, and others are more on the software efficiency side. The distinction between both is not always evident. As such, AES (one of the most-used and standardized block cyphers today) is now implemented in several intel processors with specific hardware. Software running on such a processor can use hardware instructions to use AES. Of course, before being implemented as hardware instructions in an intel processor, a block cypher has to have a certain user base. One can see this from two sides. On one hand, an "important" cypher will be studied a lot, and one might hope that evident weaknesses would be pointed out sooner or later by the research community. On the other hand, the larger the user basis, the higher the bonus for breaking a block cypher, and the higher the incentives to keep one's discovery secret in order to exploit it.

DES

DES is a block cypher with a data block length of 64 bits, and a key length of 56 bits. It is a very famous block cypher, used since the 1970-ies, and is now essentially replaced by AES. In fact, there's nothing wrong with the DES cypher except for its too short key length. The original cypher from which DES was derived was Lucifer, with a (good) key length of 128 bits, but the NSA involvement has led to a key length of 56 bits, which is considered too short as of today. A brute-force attack with today's computing power makes that a DES key can be recovered in less than 7 days for a budget of the order of $ 10 000,-, as demonstrated in 2006 by the COPACOBANA system, which had about that cost, and was built on purpose to crack DES by brute force. The weakness of DES comes from its short key length, and not from its algorithm, which has never actually been broken.

The overall structure of DES is this:

An initial permutation of the data bits
16 rounds, with 16 keys of 48 bits
A final permutation of the data bits

The initial and final permutations are each other's opposite.

The initial permutation is as follows:

58, 50, 42, 34, 26, 18, 10,  2, 60, 52, 44, 36, 28, 20, 12,  4, 62, 54, 46, 38, 30, 22, 14,  6, 64, 56, 48, 40, 32, 24, 16,  8,
57, 49, 41, 33, 25, 17,  9,  1, 59, 51, 43, 35, 27, 19, 11,  3, 61, 53, 45, 37, 29, 21, 13,  5, 63, 55, 47, 31, 31, 23, 15,  7

It should be read as follows: after the permutation, the first bit of the new data block is the old bit at position 58, the second bit in the new block is the old bit of position 50, and so on.

Each of the 16 rounds is as follows:

The input data block of 64 bits is split in a left block and a right block of each 32 bits. The right block will become the left block in the output.
The right block also enters the "f-function", together with the 48-bit round key (there are 16 round keys)
Out of this f-function comes a 32-bit word, which is XOR-ed with the left block. The result will become the right block in the output

The above structure is a general principle in the construction of block cyphers, and it is named after its inventor, Horst Feistel: it is called a Feistel network. The importance is that such a round is reversible, because as the right-most block which has been used as input to the f-function is still intact in the output, on the left side, and as the key schedule is not dependent on the data, to decrypt, we know the two inputs to the f-function, and can hence redo the computation. Applying the XOR function again to the right-most output word, we can recover the left-most half of the input word. By swapping right and left, we can alternatively obfuscate each half of the word so that no genuine data can leak through the system. The fact that alternatively the data mix with themselves provides non-linearity with respect to the data. The f-function will be highly non-linear,and will provide non-linearity with respect to the key. Of course, all the "ingenuity" resides in the f-function.

The f-function goes as follows:

The input data (32 bit) is expanded into a 48 bit word (with some bits repeated)
This 48 bit word is XOR-ed with the round key (also 48 bits)
The resulting 48-bit word is split in 8 words of 6 bits each
Each of these 8 6-bit words goes into a corresponding lookup table, called an S-box, and results in a 4-bit output each
The 8 outputs of 4 bits each are recombined into a 32-bit word: this is the word that will be XORed with the left block in the round

The expansion from 32 bit to 48 bits goes along the following table:

32, 1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9, 8, 9, 10, 11, 12, 13, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 20, 21, 22, 23, 24, 25, 24, 25, 26, 27, 28, 29, 28, 29, 30, 31, 32, 1

with the following meaning: in the 48-bit word, the first bit is the 32-th (the last) bit of the 32-bit word ; the second bit of the 48 bit word is the first bit of the 32-bit word, etc...

There are 8 different "S-boxes", a different one for each of the 8 6-bit words in step 4. These S-boxes are just listed, and there is no specific justification for them, but it turns out that they have been especially constructed to withstand an attack which is called "differential analysis". Each S-box is essentially a table with 64 entries and each entry is a 4-bit word.

The S-boxes are standard listed in a funny way: the 4 middle bits of the 6-bit word make up the column number (from 0 to 15); the first and the last bit make up together the row number (from 0 to 3).

S-box 1 is:

     14  4  13  1   2 15  11  8   3 10   6 12   5  9   0  7
      0 15   7  4  14  2  13  1  10  6  12 11   9  5   3  8
      4  1  14  8  13  6   2 11  15 12   9  7   3 10   5  0
     15 12   8  2   4  9   1  7   5 11   3 14  10  0   6 13

S-box 2 is:

     15  1   8 14   6 11   3  4   9  7   2 13  12  0   5 10
      3 13   4  7  15  2   8 14  12  0   1 10   6  9  11  5
      0 14   7 11  10  4  13  1   5  8  12  6   9  3   2 15
     13  8  10  1   3 15   4  2  11  6   7 12   0  5  14  9

S-box 3 is:

     10  0   9 14   6  3  15  5   1 13  12  7  11  4   2  8
     13  7   0  9   3  4   6 10   2  8   5 14  12 11  15  1
     13  6   4  9   8 15   3  0  11  1   2 12   5 10  14  7
      1 10  13  0   6  9   8  7   4 15  14  3  11  5   2 12

S-box 4 is:

      7 13  14  3   0  6   9 10   1  2   8  5  11 12   4 15
     13  8  11  5   6 15   0  3   4  7   2 12   1 10  14  9
     10  6   9  0  12 11   7 13  15  1   3 14   5  2   8  4
      3 15   0  6  10  1  13  8   9  4   5 11  12  7   2 14

S-box 5 is:

      2 12   4  1   7 10  11  6   8  5   3 15  13  0  14  9
     14 11   2 12   4  7  13  1   5  0  15 10   3  9   8  6
      4  2   1 11  10 13   7  8  15  9  12  5   6  3   0 14
     11  8  12  7   1 14   2 13   6 15   0  9  10  4   5  3

S-box 6 is:

     12  1  10 15   9  2   6  8   0 13   3  4  14  7   5 11
     10 15   4  2   7 12   9  5   6  1  13 14   0 11   3  8
      9 14  15  5   2  8  12  3   7  0   4 10   1 13  11  6
      4  3   2 12   9  5  15 10  11 14   1  7   6  0   8 13

S-box 7 is:

      4 11   2 14  15  0   8 13   3 12   9  7   5 10   6  1
     13  0  11  7   4  9   1 10  14  3   5 12   2 15   8  6
      1  4  11 13  12  3   7 14  10 15   6  8   0  5   9  2
      6 11  13  8   1  4  10  7   9  5   0 15  14  2   3 12

S-box 8 is:

     13  2   8  4   6 15  11  1  10  9   3 14   5  0  12  7
      1 15  13  8  10  3   7  4  12  5   6 11   0 14   9  2
      7 11   4  1   9 12  14  2   0  6  10 13  15  3   5  8
      2  1  14  7   4 10   8 13  15 12   9  0   3  5   6 11

To illustrate how these S-boxes are used, suppose that the 5-th 6-bit word is 100110. The two extreme bits are 1 and 0, and form hence the number 2. This is hence the third row (because they are numbered starting from 0) of the 5-th S-box that we have to look at. The middle 4 bits of the word are 0011, which is 3. Hence the 4-th column has to be read: that is 11. In binary form, this is 1011. So the output of the 5-th S-box on the input 100110 is 1011.

This completes the description of the data flow in the DES algorithm. We still have to describe the key schedule: how the 16 round keys of 48 bits are derived from the 56-bit master key.

Although DES is a 56-bit key block cypher, in fact it is usually stated with a 64-bit key, where every 8-th bit is a parity check bit of the 7 previous ones. The parity bits themselves do not take part in the actual key manipulation, but one should imagine the original "key word" to be made up of 64 bits (with only 56 "free" bits, because every 8-th bit is dependent on the 7 previous ones as a parity check).

From this initial 64-bit word, one takes a permutation of the "valid" bits, which is called PC-1:

57, 49, 41, 33, 25, 17,  9,  1, 58, 50, 42, 34, 26, 18, 10,  2, 59, 51, 43, 35, 27, 19, 11,  3, 60, 52, 44, 36,
63, 55, 47, 39, 31, 23, 15,  7, 62, 54, 46, 38, 30, 22, 14,  6, 61, 53, 45, 37, 29, 21, 13,  5, 28, 20, 12,  4

Next, this 56-bit word is split in two words of 28 bits, and these two words will evolve independently through the 16 stages for making the 16 round keys. Each stage goes as follows:

If we are in rounds 1, 2, 9 and 16: each 28-bit word is rotated left by one bit ; otherwise, each 28-bit word is rotated left by two bits
The resulting two words are passed to the next round
The resulting two words are also combined in a 56 bit word, and 48 bits are extracted from this word according to the table PC-2, to form the round key

The PC-2 table goes as follows:

14, 17, 11, 24,  1,  5,  3, 28, 15,  6, 21, 10, 23, 19, 12,  4, 26,  8, 16,  7, 27, 20, 13,  2, 
41, 52, 31, 37, 47, 55, 30, 40, 51, 45, 33, 48, 44, 49, 39, 56, 34, 53, 46, 42, 50, 36, 29, 32

In other words, each round key is essentially a selection of 48 bits out of the 56 of the master key. There are no logical or arithmetic operations, only a selection. This is of course extremely fast in hardware, as it is just "a set of wires". The "left rotation" is just a way of describing the bit selection, and doesn't need to be an actual operation. As such, the 16 round keys are essentially "available immediately" when the key is given.

Given the Feistel network structure, the decryption works with the same machinery, in reverse, as the encryption, as sketched earlier.

3-DES variants

DES is in fact a very good block cypher, were it not for its too short key. There are however, simple ways to turn a "short key" block cypher into a much longer key block cypher. These techniques are quite general, but they are especially useful for DES, where the only problem is the key length, and the algorithm is very secure (after 40 years of successfully withstanding attacks).

There are good reasons why doubling a block cypher doesn't work very well: there is a general attack on double encryption which is called "rainbow tables" which renders in fact any double encryption almost identical to single encryption. This is why it is necessary to consider at least triple encryption.

The first variant has a key [ k0, k1, k2 ] of length 3 x 56 = 168 bit. It is just the concatenation of 3 DES encryptions:

X = DES (k3, DES (k2, DES (k1, C) ) )

As a whole, this is essentially a block cypher with 64-bit data blocks and a 168 bit key.

The second variant looks a lot like the first one, except that we use the middle DES in decryption mode:

X = DES (k3, DES^-1 (k2, DES(k1, C) ) )

Computationally, both are almost identical, but this has the funny advantage when k3 = k2 = k1 = k, that this reduces to simple DES. This has the slight advantage of implementing at the same time simple DES and triple DES, which can be interesting when legacy cryptography has to be supported. One has to be careful however, with supporting legacy cryptography, as very often, this can lead to a security issue if this mode of operation can be enforced by an attacker.

The disadvantage of the previous two forms of triple DES is that one has to do 3 DES calculations. DES being extremely efficient, this is often not a problem. But there is a much simpler way of building a secure extended-key DES version:

X = DES(k1, C XOR ka) XOR kb

Here, the total key is [ k1, ka, kb ] where ka and kb are 64 bits each, so we have a block cypher with data length 64 bit, and key length 184 bit.

In fact, triple encryption suffers from a similar kind of attack as double encryption, namely a rainbow attack, but triple encryption can "save" two of the three keys. So the actual security of triple DES is rather 112 instead of 168 bits. The third variant is in fact more robust (and faster). It only starts to suffer from analytical attacks when mindbogglingly large amounts of pairs (cleartext, cyphertext) are collected.

AES (or Rijndael)

AES is the standard replacement for DES. Rijndael was the winner of an open contest organized by NIST between 1997 and 2001, to find a replacement for DES. There are other finalists in the competition that are of equal quality, such as Mars, Serpent and Twofish. The NIST prescription was for a block cypher with a 128 bit data block, and key lengths of 128, 192, and 256 bits. The original Rijndael proposition has also data blocks of lengths 128, 192 and 256 bit, but as the NIST contest only considered 128 bit blocks, the other two options have been left out of the AES standard. So AES has 3 versions, corresponding to the 3 possible key lengths.

AES is also a succession of "rounds", each with a round key. The number of rounds depends on the key length: for the 128 bit key, there are 10 rounds, for the 192 key, there are 12 rounds, and for the 256 key, there are 14 rounds. However, there are respectively 11, 13 and 15 round keys of each 128 bit, because before starting, there is a "pre-whitening" key that is XOR-ed with the clear text.

AES does not use a Feistel structure. Each round has the general structure as follows:

The 128 bit data block is split in 16 bytes (in fact, one can say that AES is rather a byte oriented instead of bit-oriented)
The (unique) S-box is applied to each of the bytes, and this results in again, a byte. The S-box is a byte-to-byte mapping
The 16 output bytes are next permuted in what is called the Shift Row operation
The resulting 16 output bytes are grouped 4 by 4, and each group of 4 bytes is multiplied with a 4x4 matrix, resulting in a new group of 4 bytes. This operation is called the Mix Column operation
The groups of 4 bytes are put together to from a new word of 128 bits (or 16 bytes). This word is XORed with the round key. This is the output data block of this round.

The last round doesn't have step 4 (the Mix Column operation).

In as much as DES was quite ad hoc (or at least, the system behind it has never been revealed or discovered), there is a lot of mathematics behind the AES system. In fact, many operations in AES are based on operations in the Galois field GF(2⁸) - this explains its byte-oriented structure. The representation of this Galois field is based upon the irreducible polynomial:

P(x) = x⁸ + x⁴ + x³ + x + 1

In other words, a byte is represented by the coefficients in the binary field of polynomials of the type a₇ x⁷ + a₆x⁶ + ... + a₁ x + a₀ and the operations are formal polynomial operations, modulo P(x).

The S-box of AES consists of the following operation:

if x is different from 0: calculate u = 1/x in the previously mentioned GF(2⁸)
if x is 0: take u = 0

Next, apply in GF(2), the affine function y = A u + B

where A is an 8x8 bit matrix, and B is an 8-bit vector. Note that the affine algebra is simply in GF(2). The matrix takes on the form

1 0 0 0 1 1 1 1 
1 1 0 0 0 1 1 1
1 1 1 0 0 0 1 1 
1 1 1 1 0 0 0 1
1 1 1 1 1 0 0 0
0 1 1 1 1 1 0 0
0 0 1 1 1 1 1 0 
0 0 0 1 1 1 1 1

(where one can recognize the systematics...) and the vector is:

The affine transformation is absolutely necessary to avoid the mathematical operation of the S-box to be too simple, and especially to avoid that 0 is always mapped onto 0.

The shift rows operation is simply a permutation of the 16 bytes of the data block with the following permutation table (numbering from 0 to 15):

0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11

In other words, the byte number 0 in the output data word is the byte number 0 of the input data word ; the byte number 1 in the output data word is byte number 5 of the input word, etc...

Finally, the MixColumn operation goes as follows. The 16 bytes at the output of the shift row operation are taken in 4 groups of 4 bytes. To each of this group of 4 bytes, the following matrix multiplication is applied:

It has to be noted that this multiplication takes place in GF(2⁸). In polynomial notation, the matrix with which is multiplied takes on the next form:

 x   x+1   1   1
 1    x   x+1  1
 1    1    x  x+1
x+1   1    1   x

The multiplication (and addition) is to be seen modulo the irreducible polynomial we've mentioned earlier to form the Galois field GF(2⁸).

This completes the data manipulation flow from clear text to cypher text. We now discuss the key schedule. In fact, the key schedules are somewhat different for the 128 bit, the 192 and the 256 bit keys. We will only discuss the 128-bit key schedule. 11 round keys of 128 bit have to be generated from the original 128 bit key.

The first round key, which is used directly on the input clear text, is in fact the master key itself. Next follow 10 key schedule rounds, which transform this key. They all have the same structure:

The previous round key is split in 4 words of 32 bits.
The last 32-bit word is sent through a "g-function" and the resulting word of 32 bit is XOR-ed with the first word
The resulting 32-bit word is the first part of the new round key
That same word is XOR-ed with the second word of the previous round, to form the second word of the new round key.
That new second word is XOR-ed with the the third word of the previous round to form the third word of the new round key.
That new third word is XOR-ed with the fourth word of the previous round (that was also used in step 2) to form the fourth word of the new round key.

The g-function itself is dependent on the round number, but has the following structure:

The incoming 32-bit word is split into 4 bytes
The 4 bytes are rotated, so that, if the original incoming word was B₀ B₁ B₂ B₃, then the result is B₁ B₂ B₃ B₄
Each of these bytes is then passed through the same S-box as in the data transformation part
The first byte of the transformed word is XOR-ed with the round coefficient, which is dependent on the round number.
The resulting 32-bit word is the output of the g-function (that is used in step 2 of the key schedule round description above)

The 10 round coefficients are to be seen in GF(2⁸) as x⁰, x¹, x², ... x⁹ respectively (hence taken modulo P(x), the defining irreducible polynomial of our Galois field).

In other words, the first round coefficient is 0000 0001, the second round coefficient is 0000 0010, etc. But the last one is for instance: 0011 0110 because the modulo operation is not trivial any more for x⁹.

Many choices in Rijndael have a mathematical background, as one can see, and this is sometimes seen as a potential weakness of the cypher, as it is possible that these mathematical properties can be exploited. On the other hand, clearly stated mathematical choices are more "open" than hand-crafted tables of which one doesn't know why they were chosen this way and not that way (which was one of the unfounded fears people had with the S-boxes of DES: do they contain an unknown back door ?). The mathematics also show immediately how to do the inverse operations, because all operations on the data are reversible.

SERPENT

Serpent was the second finalist, behind Rijndael, of the AES competition. It is generally accepted that Serpent is potentially more secure than Rijndael, although it lost the competition because its implementation was somewhat slower, and the security argument is only intuitive and not substantiated. The essential critique on Rijndael is its too simple mathematical structure although for the moment, no exploit of this structure is in fact publicly known ; it is only that its simplicity gives rise to the maybe ungrounded fear that one day, it will be the basis of an attack. It is generally accepted that Serpent scores better on this intuitive feeling.

Serpent, being a submission to the AES competition, has a block size of 128 bits, and a key length of 256 bits. If one wants to use smaller key lengths, a general padding scheme is proposed: adding a "1" to the left, and padding with enough "0" to the right to obtain 256 bits. As such, the requirements for the AES competition, namely keys of 128, 192 and 256 bits, are met, but one can in fact use any key length below 256 with this scheme. We will consider that one has a 256 bit key here.

The overall structure is:

an initial permutation of the data block
32 successive rounds
a final permutation

Each round consists of:

A key mixing stage
A non-linear substitution layer
A linear mixing layer (this layer is absent in the last round and replaced with an extra key mixing stage)

The key mixing stage is standard: the input data block of 128 bits is XOR-ed with the 128-bit round key.

The substitution layer is a 32-fold application of 4-bit to 4-bit S-boxes. All the S-boxes (the 32 of them) in a given round are identical, but there are 8 different S-boxes: S₀, S₁, ... S₇. If we number the rounds from 0 to 31, then round 0 will use S-box 0, round 1 will use S-box 1, ..., round 7 will use S-box 7, and the cycle starts over again: round 8 will use S-box 0, round 9 will use S-box 1, etc... So round number i will use S-box number i mod 8, and each S-box will appear in 4 rounds this way.

The S-boxes are derived from the DES S-boxes by an algorithm that has been published by the authors of Serpent. They wanted to make sure that nobody got the idea that they had constructed S-boxes with a back door in it, as had been the (erroneous) suspicion on the DES S-boxes for years. In the end, their algorithm yielded 8 S-boxes as follows:

S0:   3  8 15  1 10  6  5 11 14 13  4  2  7  0  9 12
S1:  15 12  2  7  9  0  5 10  1 11 14  8  6 13  3  4
S2:   8  6  7  9  3 12 10 15 13  1 14  4  0 11  5  2
S3:   0 15 11  8 12  9  6  3 13  1  2  4 10  7  5 14
S4:   1 15  8  3 12  0 11  6  2  5  4 10  9 14  7 13
S5:  15  5  2 11  4 10  9 12  0  3 14  8 13  6  7  1
S6:   7  2 12  5  8  4  6 11 14  9  1 15 13  3 10  0
S7:   1 13 15  0 14  8  2 11  7  4 12 10  9  3  5  6

The linear mixing layer goes as follows. The 128 bit data block is split in 4 words of 32 bits. We will write the transformation in a kind of pseudo-code. Let us consider the 4 words to be put in 4 variables: X₀, X₁, X₂, X₃.

The following sequential program is then operated on these 4 variables:

X₀ is rotated left by 13 bits
X₂ is rotated left by 3 bits
X₁ becomes the XOR of X₁, X₀ and X₂
X₃ becomes the XOR of X₃, X₂ and (X₀ rotated left by 3 bits)
X₁ is rotated one bit to the left
X₃ is rotated 7 bits to the left
X₀ is the XOR of X₀, X₁ and X₃
X₂ becomes the XOR of X₂, X₃ and (X₁ rotated left by 7 bits)
X₀ is rotated 5 bits to the left
X₂ is rotated 22 bits to the left

The output word is of course the reconstruction by the resulting contents of X₀, X₁, X₂ and X₃. The linear transformation is constructed such, that every single bit of the data word will influence all bits after 3 rounds.

Finally, the initial permutation is:

0  32  64  96   1  33  65  97   2  34  66  98   3  35  67  99
4  36  68 100   5  37  69 101   6  38  70 102   7  39  71 103
8  40  72 104   9  41  73 105  10  42  74 106  11  43  75 107
12  44  76 108  13  45  77 109  14  46  78 110  15  47  79 111
16  48  80 112  17  49  81 113  18  50  82 114  19  51  83 115
20  52  84 116  21  53  85 117  22  54  86 118  23  55  87 119
24  56  88 120  25  57  89 121  26  58  90 122  27  59  91 123
28  60  92 124  29  61  93 125  30  62  94 126  31  63  95 127

and the final permutation is:

0   4   8  12  16  20  24  28  32  36  40  44  48  52  56  60
64  68  72  76  80  84  88  92  96 100 104 108 112 116 120 124
1   5   9  13  17  21  25  29  33  37  41  45  49  53  57  61
65  69  73  77  81  85  89  93  97 101 105 109 113 117 121 125
2   6  10  14  18  22  26  30  34  38  42  46  50  54  58  62
66  70  74  78  82  86  90  94  98 102 106 110 114 118 122 126
3   7  11  15  19  23  27  31  35  39  43  47  51  55  59  63
67  71  75  79  83  87  91  95  99 103 107 111 115 119 123 127

This completes the description of the SERPENT data flow. We now come to the key schedule. The key schedule needs to produce 33 128-bit round keys (there are 32 rounds, but the last round needs 2 keys).

The original 256-bit key is split in 8 32-bit words, which are called w_-8, w_-7 ... w_-1.

An iteration formula will produce 132 words of 32 bit:

w_i := (w_i-8 XOR w_i-5 XOR w_i-3 XOR w_i-1 XOR φ XOR i) rotate left by 11 bits

Here, φ is a constant word which is, in hexadecimal notation: 0x9E3779B9 and is the fractional development of the golden ratio, and i is the round number in 32 bit unsigned format.

Once we have 132 words w_i with i positive, we combine them by 4 in 128-bit words, and we obtain 33 128-bit words. To these words, we apply the S-boxes. We apply the same S-box to all 4-bit pieces of a single word, which means that we have to apply that S-box 32 times to that word as the S-box treats 4 bits, and the word is 128 bit, just as was the case in the data rounds.

The order in which the S-boxes are associated with the words is somewhat unexpected: S-box S₃ is applied to the first word, made up of w₀, w₁, w₂ and w₃. S₂ is applied to the second word (w₄, w₅, w₆, and w₇). S₁ to the third word, and S₀ to the fourth. Next, S₇ is applied to the fifth word, S₆ to the sixth, and so on.

This completes the description of the SERPENT block cypher.

Twofish

Twofish is a variant of Blowfish, and also an AES finalist. Twofish is a very conservative approach, and uses a Feistel network.

Its general structure is as follows:

The data block is always considered as composed of 4 32-bit words, and the first 4 32-bit keys from the key schedule are used to pre-whiten the clear text.
16 Feistel rounds are applied
A last swap of the two first, and the two last 32-bit words undoes the standard Feistel flip of the 16th and last round
The 4 32-bit keys from the key schedule, 5, 6, 7 and 8 are used to post-whiten the cypher text

A round goes as follows:

As it is a Feistel network, the first two 32-bit words become the third and the fourth output 32-bit word.
The first 32-bit word enters the g function
The second 32-bit word, rotated left by 8 bits, enters the g function
The outputs of steps 2 and 3 undergo a Pseudo Hadamard Transform (PHT)
Two 32-bit round keys are respectively added (32-bit addition, not an XOR !) to the outputs of this PHT
The first output is XOR-ed with the third 32-bit input word, and the result is rotated right by 1 bit
The second output is XOR-ed with the fourth 32-bit input word, which has first been rotated left by 1 bit
The outputs of steps 6 and 7 become the first and second 32-bit output word.

As such, the key schedule needs 8 32-bit keys for the pre-and post whitening, and then two 32-bit keys per round, so in total 8 + 32 = 40 32-bit keys.

The PHT goes as follows:

If A and B are two 32-bit input words, they are transformed into A' and B' as follows:

A' = A + B mod 2³²

B' = A + 2B mod 2³²

The g-function transforms a 32-bit word. It goes as follows:

The 32-bit word is split in 4 bytes
A different S-box is applied to each of the bytes (so 4 S-boxes). The S-boxes are 8-bit to 8-bit mappings
To the resulting set of 4 bytes, one applies the MDS transformation

The MDS transformation is a matrix multiplication (4 x 4 matrix) in GF(2⁸) in a very similar way as was done in Rijndael. The bytes are seen as elements of GF(2⁸) as polynomial coefficients. The difference with Rijndael is that a different irreducible polynomial is chosen for the representation of GF(2⁸).

P(x) = x⁸ + x⁶ + x⁵ + x³ + 1.

The transformation matrix (to be applied to the column vector of 4 bytes) is chosen to be:

01  EF  5B  5B
5B  EF  EF  01
EF  5B  01  EF
EF  01  EF  5B

One should interpret these byte notations in hexadecimal as polynomials. For instance, 5B becomes x⁶ + x⁴ + x³ + x + 1 (because this is 0101 1011).

This completes more or less the data flow structure of Twofish, except that we still have to describe the key schedule and the S-boxes. The specificity of Twofish is that the S-boxes are key-dependent, and are hence part of the key schedule. The fact of having different S-boxes for each key implies that no particular property of fixed S-boxes can be exploited, which is believed to enhance significantly the resistance of Twofish against certain attacks (higher order differential attacks).

The key schedule of Twofish is quite complicated and involved. In fact, the idea is that the key schedule has only to be executed once per new key, while the data encryption has to be executed for each clear text data block, so it pays to use a rather simple and efficient data flow, if a complicated key schedule can increase security.

The key schedules are slightly different (although very similar) for lengths of 128, 192 and 256 bits, but we will discuss the 256 schedule here.

The 256-bit key (consisting of 8 32-bit words) is first transformed into 3 32-bit word vectors: M₀ contains the even 32-bit sub words of the master key, M₁ contains the uneven 32-bit sub words of the master key. These two word vectors are each 128 bits long. M₀ and M₁will serve in the generation of the round keys.

The constitution of the third 128-bit word (in fact, a set of 4 32-bit words) is more involved, and will serve the construction of the key-dependent S-boxes. The master key is this time split in 4 blocks of 64 bits (or 8 bytes). To each block of 8 bytes one applies the following matrix:

 
01   A4   55   87   5A   58   DB   9E
A4   56   82   F3   1E   C6   68   E5
02   A1   FC   C1   47   AE   3D   19
A4   55   87   5A   58   DB   9E   03

This matrix multiplication is in the Galois field GF(2⁸) with irreducible polynomial:

P(x) = x⁸ + x⁶ + x³ + x² + 1

Note that this irreducible polynomial is different from the irreducible polynomial to represent the same field in the MDS transformation ! The bytes in the 64-bit blocks have hence to be interpreted as coefficients of polynomials in the given Galois field, and the entries of the matrix, too.

As the matrix is not square, but rectangular (and derived from a Reed-Solomon code), the result of the application of the matrix to each 64 bit word is 4 bytes (interpreted as coefficients of polynomials). Putting these 4 bytes together, we obtain a 32-bit word. We call this word S₀, S₁, S₂ and S₃ where 0, 1, 2 and 3 is the number of the 64-bit block in the master key, because this is applied 4 times, to the 4 different 64-bit blocks of the master key.

The 4 key-dependent S-boxes are constructed implicitly as follows. A function is defined that maps a 32-bit word X onto a 32-bit word Y, but such, that this is done byte-wise. That is, the first byte of X is mapped on the first byte of Y, and this mapping is not influenced by the 3 other bytes of X. The second byte of X is mapped on the second byte of Y, and this is not influenced by the 3 other bytes of X. So although we will define a single 32-bit on 32-bit function, this is in fact 4 functions of 8-bit on 8-bit: these 4 functions are the key-dependent S-boxes used in the encryption rounds and dependent on the key ; in fact, dependent on the "S-word" which we just constructed. The 32-bit mapping from X to Y that defines the 4 S-boxes is called the "h-function".

The h-function goes as follows:

X is split in 4 bytes.
To these 4 bytes are applied respectively the q-boxes: q₁, q₀, q₀, and q₁.
The 4 bytes together are XOR-ed with S₀
These 4 bytes are mapped by the q-boxes respectively in the order q₀, q₀, q₁, q₁
The 4 resulting bytes are XOR-ed with S₁
The 4 resulting bytes are mapped by the q-boxes q₁, q₀, q₁, q₀
The resulting bytes are XOR-ed with S₂
The 4 resulting bytes are mapped with the q-boxes q₁, q₁, q₀, q₀
The resulting bytes are XOR-ed with S₃
The 4 resulting bytes are mapped with the q-boxes q₀,q₁,q₀,q₁
These 4 bytes form the result Y.

We see that there is no swapping of the bytes, and that the 32-bit h-function is in fact a juxtaposition of 4 8-bit functions.

The two q-boxes q₀ and q₁ are quite involved. They are byte to byte mappings, which are nibble-oriented (a nibble is a 4-bit word). They will use t-boxes, which are nibble-to-nibble mappings. q₀ and q₁ are in fact identical in structure, except that they use different t-boxes.

So q₀ and q₁ go as follows:

split the input byte in two nibbles, and put them in nibble variables a and b.
a becomes a XOR b, and b becomes the XOR of (former) a, b rotated one bit to the right, and 8a mod 16
a becomes the t-box t₀ applied to a
b becomes the t-box t₁ applied to b
a becomes a XOR b, and b becomes the XOR of (former) a, b rotated one bit to the right, and 8a mod 16
a becomes the t-box t₂ applied to a
b becomes the t-box t₃ applied to b
a and b are put together in the output byte

For q₀, the t-boxes are:

t0 = [  8  1  7  D 6 F 3  2  0  B  5  9 E C A  4  ]
t1 = [ E C B  8  1  2  3  5  F  4  A 6  7  0  9  D ]
t2 = [ B A  5  E 6 D 9  0  C  8  F 3  2  4  7  1  ]
t3 = [ D  7  F  4  1  2  6 E  9  B  3  0  8  5  C A ]

For q₁, the t-boxes are:

t0 = [  2  8  B D F  7  6  E 3  1  9  4  0  A C  5  ]
t1 = [  1  E  2  B  4  C  3  7  6 D A  5  F  9  0  8  ]
t2 = [  4  C  7  5  1  6  9  A 0 E D  8  2  B  3  F ]
t3 = [ B  9  5  1  C  3  D E 6  4  7  F  2  0  8  A ]

Both are nibble-to-nibble mappings.

This completes the description of the key-dependent S-boxes, which was one of the particularities of the Twofish cypher.

Finally, we have to describe the "classical" key schedule, to derive the 40 32-bit keys from the master key using M₀ and M₁.

The keys are generated in pairs. We need hence 20 pairs. Let i be the pair number, from 0 to 19. Pair number i is generated as follows:

We repeat the byte-representation of the number (2i) 4 times. For instance, if 2i is 12 (because i is 6), which is represented by the byte 0x0c, then we make the 32-bit word 0x0c0c0c0c.
This 32-bit word is entered as X in the h-function from the S-box generation, except that instead of (S₀, S₁, S₂, S₃), one uses the 4 32-bit words of M₀, in reverse order. That is, instead of using S₀, one uses the last 32-bit word of M₀, instead of using S₁, one uses the third 32-bit word of M₀, etc.... The result of this modified h function is again a 32-bit word, which we will call A.
We repeat the byte-representation of the number (2i + 1) 4 times. In our example, 2i+1 is now 13, which is represented by the byte 0x0d, so the 32-bit word is 0x0d0d0d0d.
We use this word as X in the modified h-function, but this time with the 128-bit word M₁ to take 4 32-bit words from (also in opposite order: instead of S₀ we use the fourth 32-bit word of M₁, instead of S₁, we use the third 32-bit word of M₁ etc...). We call the result of the h-function thus modified, the 32-bit word B.
We apply a Pseudo Hadamard Transform to the couple (A, B) from steps 2 and 4. This results in A', and B'.
The first key of our key pair is A'. The second key is B', rotated left by 9 bits.

This completes the description of the mechanisms in Twofish.

PRESENT

A very "Light-weight" cypher, optimised for hardware in very small electronics devices is the PRESENT block cypher. It uses 64 bit data blocks, and an 80 bit key.

It goes as follows:

An initial whitening by XORing the data block with the first round key
31 rounds

Each round consists of 3 operations on the data block:

An S-box application
a permutation layer
XOR-ing with a round key

The S-box is applied on each nibble of the data. This is the S-box:

C 5 6 B 9 0 A D 3 E F 8 4 7 1 2

The permutation is the following:

P(i) = i . 16 mod 63 if i < 63

P(63) = 63

The key schedule goes as follows. The 80 bit master key is copied in a register, and this register will undergo transformations. After each round, the 64 leftmost bits (MSB) are taken as a round key. The first round key is in fact the 64 leftmost bits of the master key.

Next comes the round transformation of the key register:

Rotate right by 19 bits
Apply one single S-box to the 4 leftmost bits of the register
Apply the XOR operation with the round number (5 bits, because 31 rounds) on the bits 19 downto 15.

Note that it is difficult to do simpler !

Conclusion

We have discussed a few "modern" block cyphers. There are several other block cyphers in existence. Even though the different block cyphers we've discussed are quite different from one another, they all use the same kind of techniques:

permutations of bits, nibbles, bytes (this is a linear operation)
application of XOR between key material and data (this is also a linear operation)
mapping (substitution) of nibbles, bytes or other amounts of key material or data onto other values (this is potentially non-linear)

By alternating and repeating these operations, one tries to obtain that every bit of the clear text block and every bit of the key has statistically an equal influence on each bit in the cypher text. The point is that there are cryptanalytic attacks that try to exploit any statistical difference of the influence of each clear text bit, and especially, of each key bit. These techniques are called differential cryptanalysis. Many block cyphers have been cracked by differential analysis. As such, even though many operations in the described cyphers seem totally arbitrary, in fact they are very carefully chosen in order to equalize most statistical influences of each bit of key and clear text on the output. This is why the design of a good cypher is difficult even though it seems that if one does enough complicated operations, that will do.

On the other hand, several schemes want to avoid the suspicion that there is some secret back door in the cypher. Indeed, it might very well be that there are totally unexpected high-order correlations that give free the key. Without knowledge of these high-order correlations, it is almost impossible to detect such a scheme. If it was put in the cypher on purpose, then this could be considered a back door and no amount of a posteriori analysis would reveal it if it is sufficiently sophisticated relationship if one doesn't know what to look for. For a long time, there had been a suspicion on DES that there was a back door because the S-boxes had no explanation. This suspicion has proven wrong, but once the box of Pandora is open, every future cypher had to find a way to prove that there was no hidden trick in it. The simplest way to do so, was to use rather straightforward mathematics, or by using techniques that can clearly not be "designed on purpose with a back door", like, say, using the digits of the number pi or so. However, this poses another problem. In as much as totally ad hoc systems can indeed be intractable to reverse, if there is more mathematical structure to the system, this may give a lever to find the "forbidden" reverse function. As such, the appearance of a lot of mathematics (like using the Galois field GF(2⁸) is a double-edged sword: on one hand, this "proves" that no hidden intentions can be present that aren't potentially discoverable by mathematical analysis, but on the other hand, this gives potentially a lot of levers (also by mathematical analysis) to find the reverse functions that aren't supposed to be found. It is a critique that applies particularly to Rijndael (the winner of the AES contest), because of its rather simple mathematical structure. As of now, no exploit of this structure is known, but this is why it can be interesting to use other block cyphers than the AES winner. But one should refrain from "inventing one's own block cypher". Too many good-looking block cyphers have been cracked with sophisticated cryptanalysis techniques.

Finally, a consideration is efficiency. One could think that the more "rounds" a cypher has, the more it will be intractable. In fact, this is only partially true. If one has a "bad" cypher round, but one repeats it an incredible number ot times, this will probably start to become a secure cypher, however, it will be a very inefficient cypher. In fact, once the statistical influence of each input bit is sufficiently uniform in the output, and if one thinks that the mixing is sufficiently irreversible, there's no point in adding more rounds. If a cypher is bad enough that the last rounds "give in" the key in the output, then it doesn't matter how many previous rounds there are: the key will leak. So a good cypher is efficient: it obtains irreversible mixing, and statistical uniformity in a relatively small computational effort. But of course, a good cypher is more secure if there is some "margin" and there are somewhat more rounds than strictly necessary: as such, some imperfections can be forgiven. SERPENT took a conservative approach here, as compared to Rijndael, and lost, because the AES committee attached more importance to efficiency than to a safety margin. Twofish uses a quite classical and even rather simple encoding, but on the other hand, has a very intensive and complex key schedule, with key dependent S-boxes (making every statistical analysis of non-uniformity of the S-boxes much, much harder).

If one needs long term security from powerful adversaries, one may want to use several block cyphers with different keys in succession. It is of utmost importance that the keys are different of course (resulting in longer effective keys). It is a good idea to use Rijndael in the mix, but one should also include other cyphers, preferentially based upon different technology.

However, if one needs only short term protection from not so powerful adversaries on high data volumes or with very limited resources, one may be inclined to use very efficient light weight cyphers such as PRESENT. It can also be a good idea to improve the security of a "standard", to add a lightweight cypher to it, if one can permit the extra overhead.