Introduction

Block cyphers and compression functions are building blocks which can be used to build cryptographic applications.  We are going to explore a few of these applications.  We recall that essentially, a block cypher is an algorithmic implementation of a function f, such that X = f(C,K), with X and C numbers from the same subset of natural numbers (namely, the set {0, 1, ... 2N-1}, where N is the number of bits in the clear text and cypher text), K a number in the subset of natural numbers {0, ... 2M - 1} where M is the number of bits in the key, in such a way that the inverse function g exists, with C = g(X,K), but that it is not possible, in practice, to find K if one has even many (C,X) pairs.  A compression function is essentially the same, but we call {C,K} simply B of length N+M bits, and there's not necessarily an inverse function g.

Let us look at some applications.

Symmetric secret messaging

Symmetric secret messaging is probably the original application of cryptography, and cryptography is still very much associated with symmetric secret messaging.  The application is the following: two "friends" A and B want to exchange messages they don't want to be understood by "enemies" E over an insecure channel, that is a channel that will get the message written by A to B, but can be potentially read (and/or modified) by E.  The most obvious way to imagine this is that E is the messenger, and A and B don't want the messenger to read what A wants to tell B.  We've seen that in cryptography, what distinguishes "friends" from "enemies" is the possession of knowledge.  In our case, it is the possession of the secret key K: A and B have to be able to share a secret key K that E doesn't know.  A and B have to generate K together, or one of them (say, A) has to generate K and has to transport it to B, or a "trusted third party" has to generate K, and has to transport it both to A and to B.  It is obvious that this transport has to be secure.  If the transport of the key is insecure, then the former "enemy" E can become cryptographically a friend, and there are no secrets to be had any more.

Now, this begs the question of course: if there's a secure channel that can transport the key, then why not use that channel to transport the message in the first place ?

It is only when this question can be resolved in a meaningful way that symmetric secret messaging has any sense.  The answer to this is usually that the parties A and B can meet physically at a certain point in time, but will then get separated by a distance where they have to use insecure means of communication.   We will see that there is another answer to this question: it is possible that one can set up a secure channel, but which is very computing-intensive.  It can then pay to send only the key through this secure channel, to send the (much larger) message afterwards using symmetric encryption.   But, in every case, this question has to be considered.

It looks like block cyphers were exactly designed to do that, and that is indeed the case, but we will see that the simple usage of block cyphers can be problematic, and that one actually doesn't need the reversibility of a block cypher in all cases.

Securing data storage

A special case of "secret messaging" is when the sender and the receiver are the same person or entity.  Now, this can sound strange at first sight: who needs to send a message to one-self over an insecure channel ?  However, this is exactly the situation of the storage of sensitive information.  The information carrier (a hard disk, a USB key, a mobile phone) can be considered insecure if it can be read by others, copied by others, accessed by others or stolen.  As such, it can be interesting to keep the data on the carrier in the form of a cypher text, and when one needs to use the data, one can provide the key that decrypts the data to be able to access the original, clear text data.  Of course, in the same way that hanging the key of the front door on a string next to the door is a bad idea, it is a bad idea to keep the encryption key with the data storage that is encrypted.  Also, when the key is applied to the storage, it is "open" and hence, at that moment, vulnerable to access, theft or copy.   But encrypting data to secure a storage is a very important application of symmetric cryptography, that is gaining more and more importance.   The way it is done has a lot of similitude to symmetric secret messaging, but one has to take into account the specific aspects of this application.

The question, similar to the one that is central to symmetric secret messaging, is: if there's a secure way to store the key, then why not use it to store the data securely ?

Pseudo Random number generation

The ability to share a series of random numbers amongst friends can have several applications.  It allows friends to have perfectly correlated, and yet seemingly random, data streams of infinite length, based upon just a shared key, when there is no means of communication.  This can allow for correlated action between friends which cannot be imitated by enemies.   An application is "spread spectrum" radio communication: both the emitter and the receiver have their synchronized random number generators running, and switch frequencies of emission according to the random numbers.   Because both generators are synchronized and give out the same numbers, emitter and receiver both know at all times what frequency is used ; but to an enemy, this jumping around of the base frequency just looks like noise.  It is the perfect correlation between the two streams of random numbers that allows the meaningful reception of the signal.

Another application resides in the proof of identity often used by banks when allowing internet access to their customers.  At a certain point, the customer needs to be physically present with the banker, so that the bank can have confidence that the customer has its legitimate identity.  The bank then shares a secret key with the customer (sometimes not even visible to the customer, but included in an application or a hardware device).  Based upon this key, a random block can be generated after n iterations, and both the bank and the customer can do this without further communication.  The number of iterations is based upon the time (say that n increases by 1 every 30 seconds or so).  When the customer wants to prove his identity to the bank at a later instance, his application or his hardware device calculates the n-th random block based upon the present time and sends it to the bank.  The bank can find out that this random block corresponds indeed to the n-th block of the customer.

Even if this number is intercepted by anyone, 30 seconds later, this number becomes meaningless to the bank. 

There can be many other applications of the generation of pseudo random numbers.  It has utility each time that there is something to be won by having perfectly correlated, seemingly random behaviour by two friends sharing the key.

Another application of pseudo random number generation by cryptographic techniques is the verifiable fairness of randomness.   Consider a lottery.  The basis of a lottery is that the drawing is "fair", that is to say, that nobody can determine the outcome of the drawing ; or that nobody can know the drawing in advance.  These two aspects: "not being able to pick the outcome while telling everybody that it is random" and "nobody knowing the outcome in advance" are two criteria of fairness of a lottery.  Which one of the two is most important depends on the kind of lottery.  If the participants cannot pick their lot at random, then the first aspect is most important.   Suppose that one has the following lottery: a day of the year is drawn (one out of 365 say), and the person who has his birthday closest to the drawn day is the winner.  Your birthday is known, and you cannot pick it.  One simply needs to find out that the result of the drawing couldn't be determined by someone and was "fairly random".  Whether the result could be known in advance or not doesn't really matter.  If Joe cannot pick the outcome to be his birthday (or the birthday of a friend), that's fair enough.  But it has to be verifiable that this is "random".

The second aspect (not being able to know in advance) is important if the participants have some choice in their lot.  It must be essentially impossible for those setting up the lottery drawing, to determine in advance what will be the outcome of the drawing.  

Hashing and MACs

Generating a small and finite digest of a large set of data is a "portable proof of authenticity".   There are two kinds of digests: those that can be generated by anyone, and those that can only be generated by friends.   The first kind of digest is called a hash, the second: a MAC (Message Authentification Code).

Most cryptographic applications consist in using the digest to verify the authenticity of the original data.  The data themselves are not "secret", one just wants to verify that it are the unmodified, original data.  If this verification has to be public, then one needs to use a hash ; if this verification only matters within a set of friends that share a key, then MACs can be used. 

A typical application of a hash is the verification of the correctness of a distributed piece of software.  Indeed, when downloading software, one always takes the risk of downloading corrupted software, where a malicious person has introduced virusses, spyware or other malicious code.    If one can obtain the hash of the software in a secure manner, then after downloading, it is easy to calculate the hash of the downloaded software, and to compare it to the obtained hash.  If it fits, then there is a quite high level of confidence that the downloaded software is the original one of which the hash was calculated, and has not been tampered with afterwards.

However, this begs the question again: if there's a secure way to obtain the hash, then why not use that same secure way to download the software itself ?  Indeed, the enemy can just as well calculate a hash of his tampered version as the original author.  If that enemy can make you believe that his calculated hash is the "right" one, then you will happily download the tampered software, download the corresponding hash from your enemy, and you will be able to verify that both are in agreement.  So you need to be sure that the hash you've obtained is the correct one.

However, the hash being much smaller than the software package, this is maybe much easier to do.  The software package can reside on a potentially insecure server, but the hash can be copied from a much more secure web page or another source.

This difficulty of the enemy being able to provide with a correct digest of a tampered-with data block is essentially removed when using a MAC.  However, the price to pay is that a MAC has only a meaning between people sharing a common secret key, which brings us back to the original question of how the original secret key can securely be distributed.

But a MAC can only be generated, and verified, by someone possessing the common secret key.  Essentially, a MAC is a hash, encrypted  with the secret key.  As such, there's no danger for an enemy, not in possession of the secret key, to be able to generate a valid MAC.  But in the same way, there's no possibility for anyone else but those possessing the secret key, to verify the authenticity of the MAC.  So the nice thing about a MAC is that it can be distributed on an insecure channel too.  It can even be part of the data set itself, from the same insecure source.  If you have the key and the MAC verifies, that means that the data correspond to the MAC, which can only be generated by someone else possessing the secret key.

If friends share a secret key, they could of course also just distribute the encrypted data.  But in as much as this distributed data are no secret and can be public, and the only thing these people want to achieve is the certainty that the data has not been altered, then a MAC can be used instead of encrypting the whole data set.

Proof of knowledge

Sometimes, it can be important for two people to establish whether they are friends or not, which means, in the cryptographic sense, whether they both possess the secret key.  This can be in one direction, or in both.  For instance, you may have shared a secret key with your brother which is on a trip, and you get a phone call by someone claiming to be your brother.  How do you know ?  If you simply show (or ask for) the password/secret key, then of course, the other party now knows it.  If you don't know whether the other party is to be trusted or not, you cannot do that.  More complicated: suppose that the channel is insecure.

A way to do so is to provide the other party with some data, and ask for a MAC of that data.  Only people knowing the key can generate, and can verify, that MAC.  This can be done in both ways, and then both parties know that they are friends, without compromising the key.  

Proof of work

It can sometimes be necessary to "slow down" certain things sufficiently because if someone can do them too quickly, the system fails or becomes too unfair.   For instance, one may require that one doesn't access thousands of times a certain server per minute (for different reasons).  Maybe you don't want people to download systematically all files on your server, but want them to pick only a few ; maybe you simply want to avoid your server to be overloaded.  There can be many reasons to require an entity to "slow down".  On the internet, just checking some or other "identity" is difficult because identities can easily be spoofed: the same person can have thousands of IP addresses, thousands of e-mail accounts, ....  One evident way to limit access is by making it expensive in money.  You may simply sell the stuff.  If people pay, they get it.  But there can be circumstances where this is not desirable, and where you want to put something up for "free" but in such a way that people cannot access it too many times.

It can be useful to render a service expensive in another way than with money ; expensive in computer time.  You may give a puzzle to solve to people of which it is known that it will take a large amount of computing time.  Only when they come up with the solution, they can access.  For instance, if you ask to solve a puzzle that will take about 5 minutes on an ordinary PC, then most people with an ordinary PC will be able to solve one such puzzle every 5 minutes.  If they possess 5 PC, they can access every minute.  But if they want to access 10 times a second, they need to possess 3000 PC.  The number of people able to pull that off is not very large, and hence your protection works.

The classical way of doing proof of work is by requiring to produce a hash of a piece of data of which the first part is imposed, and the second part is modifiable by the party proving the work, in such a way that the resulting hash belongs to a subset of all possible hashes.   The only practical way for the counterparty to do that, is by trial and error, and the ratio of the size of the set of hashes, over the size of the allowed subset gives an indication of how many trials, on average, the counter party will have executed before having a successful attempt.

Modes of operation of block cyphers for clear text encryption

Suppose that we want to have a cypher function F such that XX = F(CC,K).  This resembles the block cypher function X = f(C,K), except that we are now talking about a much larger data set CC of clear text, and hence a much larger cypher text XX than the C and the X of the block cypher function f.  We want to build F using f.  We will start by cutting CC into n blocks of length 2N, which we call Ci.  Similarly, we will have the cypher text composed of blocks Xi.  The question is: how do we use f ?  Each way to do so is called a block cypher mode of operation.

Electronic Codebook mode (ECB)

The most straightforward way of doing so is of course:

Xi = f(Ci , K)

The corresponding decryption goes like Ci = g(Xi , K)

Now, simple and straightforward as this may seem, there are several problems with it.  The most obvious problem is that, if the clear text repeats several times the same block, then the cypher text will also repeat itself several times ; hence this information of the fact that the clear text has identical blocks is leaked in the cypher text.   When one encrypts images, this has a spectacular effect, and reveals for instance the contours of zones of uniform colour.

An aspect that can be seen as an advantage or a disadvantage, depending on the viewpoint, is that one can replace parts of the cypher text, without affecting the rest.  This is a disadvantage when considering that the transmission channel can actively modify the message (for instance, duplicating blocs, or leaving out blocks, inserting blocks from another cypher text with the same key...).  But it is an advantage if one considers data storage.  Indeed, if ever part of the stored data is damaged, the rest can still be decrypted without problems.   The electronic codebook mode is also interesting because it can be easily executed in parallel.  The encryptions of the different blocks are independent and can hence be done on parallel hardware.

Cypher block chaining mode (CBC)

In order to solve the weakness in ECB that same clear data is encrypted to the same cypher data block, a random initialisation block of the same length as C is chosen by the party calculating the cypher text.  This random block is transmitted in the clear with the cypher text.

The encryption is now:

X1 = f(C1 xor I , K)

Xi = f(Ci xor Xi-1 , K)

Note that even if all the C are the same, because of the differently chosen I, X1 will be different, and hence all the successive X will be different.

The decryption goes as follows:

C1 = g(X1 , K) xor I

Cj = g( Xj , K) xor Xj-1

This technique has the further advantage that, if one of the cypher blocks is damaged during transmission, this will only render unreadable two clear texts.  The CBC method gains its security from the fact that I is different for each new encryption.  If one re-uses I, one starts to suffer from similar weaknesses as in ECB mode.  CBC is not entirely free from a substitution attack.  Some cypher blocks can be substituted and the rest of the encrypted text still makes sense.  The decryption of these substituted blocks will most probably look random when decrypted and the clear text cannot be imposed by the attacker.  The CBC mode cannot be executed in parallel, as the input of one function application depends on an unpredictable way on the output of the previous function application.

Output feedback mode (OFB)

In the output feedback mode, we do not need the reversibility of the block cypher function (we do not need g).  As such, the whole difficulty and also probably the vulnerability of a traditional block cypher of having to be able to build a reverse function g is done away with.   In essence, the cypher feedback mode uses the block cypher as a pseudo random number generator, and uses these random numbers as the "key" in a kind of one-time pad to encrypt the clear text (and also to decrypt the cypher text).  It is of course important to have a random starting block I which needs to be transferred in the clear with the cypher text.

We have:

R1 = f(I , K)

Ri = f(Ri-1 , K)

as a "one time pad key" generator, both for the encryption and the decryption.

Encryption: Xi = Ci xor Ri

Decryption: Ci = Xi xor Ri

The output feedback mode cannot be executed in parallel, but the R blocks can be generated in advance (by both parties if the initial block is shared).  A property of this mode is that if part of the cypher text is damaged, this only has implications for the clear text on this particular zone.  The rest can still be decrypted without problems.

Counter mode (CTR)

The counter  mode is very similar to the OFB mode, but allows one to process in parallel.  Instead of chaining the computation of the R blocks, one proposes rather input blocks which are composed partly of a random initialisation, and partly of a counter:

Ii = composition of I and i

Ri = f( Ii , K)

The encryption and decryption are the same as in OFB.

Cypher feedback mode (CFB)

The cypher feedback mode has some similarities to the OFB.

Encryption:

X1 = f (I , K) xor C1

Xi = f(Xi-1 , K) xor Ci

Decryption:

C1 = f(I, K) xor X1

Ci = f(Xi-1 , K) xor Xi

This cypher cannot be executed in parallel nor can anything be calculated in advance.

Modes of operation for clear text encryption and authentication

The previously discussed modes allowed one to extend the concept of block encryption to the encryption of long clear texts.  However, several schemes were vulnerable to modification of the cypher text over an insecure channel.  Although certain modes were more indicative of such a problem than others (they would generate at least one block of rubble), it is now considered that the functions of encryption (confidentiality) and of authentication are two different cryptographic goals and that the authentication should be secured in another way than "detecting rubble" during decryption.  The simplest way of securing the second function would be to add a MAC to the cypher text.  However, calculating independently a MAC is usually almost as computing-intensive as producing a cypher text.  Hence, there have been developed modes which are doing both simultaneously in order to make more efficient use of computing resources: they generate both the cypher text and a kind of MAC during the process.

We will only discuss the most famous of these modes, which is the Galois counter mode (GCM)

The encryption of the Galois counter mode is the same as the encryption with the counter mode, except that we start "after the first count".

That is, we have a "starting value" for the counter I0 and the encryption starts with I1 in the counter mode, so we have the encrypted blocks Xi (i starting at 1).

But we also compute a MAC, in the following way.  The user building a cypher text has to provide an authentication text, which will be transferred in clear text and which includes information about the sender.  We call this text block AAD.

We calculate H = f ( 0, K).

With the AAD block and the H block one calculates g0 = AAD × H, where this multiplication is a multiplication in a Galois field of the size of the block.  This Galois field is part of the encryption specification.    A standard Galois field is used for blocks of size 128 bits in the field of size 2128 with the irreducible polynomial P(x) = x128 + x7 + + x2 + x + 1.  But this can be just any Galois field, adapted to the case at hand.

We now compute the MAC recursively from the cypher blocks:

gi = ( gi-1 xor Ci ) × H

Finally, the last gn is used to compute the MAC: MAC = (gn × H) xor f (I0 , K)

The Galois multiplication being computationally much lighter than a full encryption of a block, this calculation of a MAC adds little overhead to the encryption procedure.  The price to pay is of course that it cannot be parallelized, but that's unavoidable with a MAC.

For decryption, as the encryption is a normal CTR mode encryption, the decryption of the cypher text is as for a CRT mode ; the calculation of the MAC is identical on the receiving end, and the received MAC is compared to the locally calculated one from the cypher text.  If both are in agreement, this means that the cypher text has not been tampered with during transmission ; given that it is based upon the AAD, this AAD cannot be tampered with either (for instance, a network address).

Message authentication without encryption using block cyphers

There is a special application of the CBC mode, which is the generation of a MAC without encryption.  If one simply wants to generate a MAC of a clear text, then encrypting the clear text using the CBC mode, but only considering the last cypher block, consists in a MAC.   One transmits the clear text and the last cypher block as a MAC.  This is called CBC-MAC.

The Galois counter mode can also be used to generate only a MAC.  This is called GMAC.  Again, the cypher text is not transmitted, and only serves to calculate the MAC.

Random numbers from block cyphers

The OFB and the CTR modes of operation already illustrated the use of block cyphers to generate a deterministic cryptographically secure pseudo random stream.

Hash functions from block cyphers and compression functions

The standard compression function primitive of a hash application is such that we have a function H = h(B) where B is taken in a larger set than H.  If B has N bits, and H has M bits, and N > M, we have a genuine compression function and the resulting hash will be of M bits.  In the case that we use a block cypher, we have H = h(C,K), where B is seen as the composition of C and K.

Merkle-Damgård construction

If we want to calculate the M-bit hash of a large data set (which is the essential application of a hash function: calculating a finite length digest (here, of M bits) of a potentially very large data set), then the standard construction is as follows:

The large data set is split in blocks of size (N - M): x1 x2 x3 ... xn

We pick an "initialisation" I of size M, which can be a constant, specified in the standard of the hash function, or can be an implementation-dependent constant.  In any case, this constant has to be publicly known.

H1 = h( I // x1)

Hi = h(Hi-1 // xi )

Hn is the hash result of the data set.

Davies-Meyer construction

The Davies-Meyer construction is well-suited to the use of block cyphers in a hash application, because by design, block cyphers have an input of the same size as their output: the clear text input.  The other input, the "secret key", has potentially a different length. 

As such, the data set is split in blocks of size K, where K is the key length:x1 x2 x3 ... xn

In the same way as with the Merkle-Damgard construction, a publicly known initialisation block I is needed, of size M.

We now have:

H1 = I xor f(x1 , I)

Hi = Hi-1 xor f(xi , Hi-1)

Again, the last Hn will be the resulting hash of the large data set.

Matyas-Meyer-Oseas construction

This construction doesn't take an advantage of the fact that with a block cypher, there's an input with the same length as the output.  The data set is split in blocks of length M (of the same size as the final hash).  We now need a function u(H) that modifies the length of a block of length M into one of length K (whether longer or shorter).  There are no special requirements on this "compression" or "expansion" function.

We have again an initialisation block I of length M.

H1 = f(x1 , u(I) ) xor x1

Hi = f(xi , u(Hi-1) ) xor xi

The difference between the Matyas-Meyer-Oseas construction and the Davies-Meyer construction is that in the former case, the data to be hashes enters via the "clear text" input, and in the latter case, the data enters via the "key" input of the block cypher, while the output feedback enters in in the "key" input (and has hence to be modified in length in certain cases) in the former case, and the output feedback enters in the "clear text" input (and is of the right length) in the latter case.

There are still other constructions, like the Miyaguchi-Preneel and the Hirose construction.

A typical property of these recursive compression function applications is that, if one appends data to the data set, one can start with the hash of the original data set and only needs to iterate over the new blocks.  Indeed, all one needs for all these schemes to calculate Hn+1 is the knowledge of Hn and the new data block xn+1.   For the standard requirements of hashing applications, this is not a problem, but when one wants to use hash functions in other applications, one has to take this into account.

The sponge construction

A different type of construction to provide for a hash application is the sponge construction.  The sponge construction consists of a "register" S of s bits, of which the first r (r < s) are "input" or "output".

There is a 1-1 "compression" function which can map a block of s bits onto another block of s bits: T = f(S).

The input data is split in blocks of length r: x1 x2 x3 ... xn

Initially, S is put to 0 (all bits are put to 0).

The following algorithm is repeated n times (the number of blocks of input data):

  1. R, which are the first r bits of S, are XORed with xi and the result is placed back into R
  2. S is replaced by f(S) (containing the piece R).

When one wants to obtain output of a length k r, then one repeats k times the following algorithm:

  1. R is read out, providing r bits of output
  2. S is replaced by f(S)

Normally, S is much larger than R.  The part of S that is not R is the "entropy reservoir" which is "filled" during the input cycle, and which is "poured out" during the readout cycle.

It is interesting to see that the sponge construction doesn't allow one to calculate the hash of the original data set with appended data by just continuing

MAC from hash functions

There are some naive constructions using hash functions to produce MAC which are vulnerable if the hash function can be "appended" (which is not the case with a sponge construction).  Indeed, if Hn+1 can be calculated from Hn, then it is possible to add text to the original text, and from the original MAC, produce a new, valid MAC without possessing the key.

The most obvious MAC construction is to take a hash of the data, and to encrypt that hash with a block cypher.  However, one has to be very careful with that construction, which is not automatically secure.  For instance, if one uses the CTR mode encryption of the hash, as one knows the plain text as well as the encrypted hash, which is symply XORed with the encrypted initialisation vector.  This encrypted initialisation vector can be found by XORing the MAC with the hash of the clear text (both are known to the attacker).  He can now totally modify the clear text, calculate a new hash, XOR it with the encrypted initialisation vector he just found, and obtain a perfectly valid new MAC for the modified text.  There are ways to encrypt a hash securely but one has to rely on properties which are not imposed on

We see again some examples of the bad idea it can be to "invent cryptography in one's basement".  There is a provably secure way to generate a MAC using a hash function only.  It is called HMAC.

HMAC

If one has a secret key K (of length one block of the hash function) then one makes two auxiliary keys Ki and Ko as follows:

Ki = K xor (a succession of "00110110")

Ko = K xor (a succession of "01011100")

If C is the (long) clear text, and h is the hash application that can work on any arbitrary length data, then:

HMAC(C, K)  = h( Ko // h( Ki // C) )

(where // means: append)

The HMAC construction has very little overhead as compared to the simple calculation of a hash of the text: the second hash call has only to hash a rather small text.

Padding

A practical issue of all block-based cryptography is that the clear text needs to be split in a number n of blocks of length M, which supposes that the length of the clear text is a multiple of M.  In practice of course, this is not the case, so one has to "complete" the actual clear text with enough bits so that one reaches such a multiple.  The procedure to do so is called "padding".

Most cryptographic standards specify exactly how that padding should be done.  The most trivial way to do padding is by adding zero-bits until the next multiple of M is reached.  This has however the disadvantage that if the real message ends in several zero bits, one doesn't know what zeros are part of the original message, and what zeros are part of the padding.  This is why a popular way of padding consists of adding one "one" bit, followed by enough "zero" bits.  As such, one knows that the padding started at the last "one" bit of the text, and that all preceding bits are part of the actual message.  If necessary, that is, when the actual message had a length which was a multiple of M, one adds a whole new block consisting of one "one" and M-1 zeros.

The SHA-1 protocol actually modifies the above padding somewhat, by adding, after the "1000...0" padding, a 64 bit word which encodes the length of the message as an unsigned integer.

The SHA-3 protocol (Keccak) has again a slight modification of said protocol: the padding consists of adding "01", then a 1-bit, the right number of zero bits, another 1-bit.  The length of the message is not included any more.