Principles of asymmetric cryptography

Symmetric cryptography is based upon a division of people in "friends" and "enemies", friends being people who share a secret key.  Symmetric cryptography applications are limited by two essential points: the fact that friends have to share a common secret key (which begs the question of the secure channel over which they will share this key) and the fact that once friends share this key, they can all do the same things, and hence are not distinguishable amongst them.  In other words, if a MAC is generated with a secret key, then any of the friends could have generated that MAC.  As we have seen, there are different applications possible with symmetric cryptography, but there are many applications which are impossible with such a scheme.

These two difficulties are exactly what asymmetric cryptography tackles.  The core idea of asymmetric cryptography is that a secret key is in fact never shared, but kept private by the person or entity that generated it.  As such, the problem of having to communicate a secret key over a secure channel disappears, because it never has to be communicated.  And second, given that every entity (person) has his own private secret key, there's no confusion between entities, such as was the case between friends in symmetric cryptography.   In other words, the secret key becomes also a kind of "proof of identity".  In asymmetric cryptography, one can in fact see the world as a single "friend", and the rest of the world as "enemy" for each secret key.

In order to achieve this, asymmetric cryptography is based upon the construction of a secret key and a public key.  There is a bijective relationship between the secret key and the public key, but in such a way, that one can calculate the public key from the secret key, but that it is essentially infeasible to calculate the secret key from the public key (even though that unique relation exists of course).  The secret key is, as the name says, kept secret and never shared with anybody (not even a "friend").  The public key, as the name says, can be made public and doesn't need to be protected or kept secret.

It is often stated that what makes asymmetric cryptography possible, is what's called a "trapdoor function": a function that can be calculated easily in one way, but impossible to calculate in the other direction.  That is, v = f(u) is easily calculable, but u = g(v) is essentially algorithmically impossible to do in any practical way.  This statement by itself is wrong, because symmetric cryptography also has several trapdoor functions.  A block cypher X = f(C,K) can be inverted like C = g(X,K), but it is by construction impossible or extremely difficult to find K = k(C,X).  A hash function from a piece of data D with the same number of bits as the hash H, H = h(D) is easily calculated, and D = u(H) is essentially impossible to calculate.  So symmetric cryptography is already full of "trapdoor functions".

What makes asymmetric cryptography possible, is the fact that there is a trapdoor function "public key" P = T("secret key" S), but that it is moreover possible to "transfer" this trapdoor property to another function: X = f(C,P) but there is no g, such that C = g(X,P).  However, there is a g', such that C = g'(X,S).

In other words, using the public key, f is a trapdoor function, which cannot be inverted (with the sole knowledge of P).  But f has an inverse function, g' if one has the knowledge of S.  This construction, of a "trapdoor function, if one only knows P, but invertible function, if one knows S", is what is the core of asymmetric cryptography.  One could say that the key to asymmetric cryptography, is a "trapdoor function with a back door" where that back door is the secret key.  The trapdoor functions of symmetric cryptography do not have a back door.  There is no "extra knowledge" that allows one to write the inverse function of a hash function. 

Mathematical basis for asymmetric cryptography

The technological basis for asymmetric cryptography is a "trapdoor with a back door".  Mathematically, it means that we have to find a problem, such that we can find a bijection T that maps secret keys on public keys, of which the inverse problem is known to be hard and nobody knows a solution to it, and an associated "application" that comes down to being able to calculate relatively easily X = f(C,P) and C = g'(X,S), but that there's no easy way to find a calculable function C = g(X,P). 

There are essentially two well-known classes of problems that are used for asymmetric cryptography, and there are a few others, which have, at this moment, only marginal success in usage (which doesn't mean that they lack any potential, it is just that they aren't much used, studied, and applied.

These two problem classes are:

  • the factorisation of a product of two large primes, and Euler's theorem
  • the discrete logarithm in a finite cyclic group and the fact that (ai)j = (aj)i = ai.j

How one can transform these settings into an actual trapdoor function with a back door, is the whole art of designing an asymmetric crypto system, of course, but at the core of about all modern, much used asymmetric crypto system, one of both problems (or both) are used.  The discrete logarithm problem has in fact a very large scope because there is a large class of cyclic groups that can be used.  One class of cyclic groups is the multiplicative group of a prime modular field.  Another class of cyclic groups is based upon elliptic curves.   In fact, this last class is now the most promising asymmetric system, because it allows higher security with shorter key lengths.

The danger with an asymmetric cryptographic system is that mathematicians make strong progress on the mathematically difficult problem that is used to implement the relationship between the public key and the private key.  The whole idea is that it is easy to find the public key from the private key, but that it is not feasible to find the private key from the public key, even though this mathematical function exists.  If people make progress in calculating this function, this weakens seriously the security of the crypto system that is based on it.   As some progress has been made on classical problems, this explains why the key lengths in this kind of system are always very long (one shouldn't be surprised to find keys of thousands of bits long).

Prime factorisation: basic idea

The crypto system that is based upon the prime factorisation problem, is the RSA crypto system.  We will describe it in more detail, but we illustrate the core mathematical idea here.

The idea is to have two large prime numbers p and q, which are "random" and hence secret.  Their product, n = p.q , is public.  The whole idea is that the Euler Phi function of n can easily be computed if we know p and q: Phi(n) = (p-1).(q-1).  But that for someone only knowing n, this is an impossible task.

If we render as public key, n, and a number e, which is relative prime to Phi(n), so the public key is (n, e), then a message can be encrypted as follows:

X = Ce mod n

using the public key (n, e).

In order to decrypt it, we need the private key, which consists of (p,q,e).  Indeed, we can calculate relatively easily the multiplicative inverse of e in the modular ring of modulus Phi(n) using Euclid's extended algorithm.  As such, we have:

d.e = 1 mod Phi(n)

Decryption is then:

C = Xd mod n

Indeed, Xd = (Ce)d mod n = Cd.e mod n = C1 + k Phi(n) mod n = C using Euler's theorem.

This is the core of the RSA asymmetric crypto system.  The whole point is that if we get n, we don't know how to factor this into its two large prime factors p and q, although this is mathematically well-defined.  We simple don't know how to solve that problem.  As such, Phi(n) cannot be calculated, and hence the calculation of d from e cannot be done as we don't know in what ring the inverse must be taken.

We see here that there are two "trapdoors" in this scheme.  The first one is taking the product of two primes: n = p.q.  If we know p and q, it is easy to find n through a multiplication.  But if we know n, we don't know how to get back to p and q although this is well-defined.   This is the key trapdoor function T.  The second trapdoor function is "taking the power e".   Taking the power e of C renders X, but even if we know e and X, we don't know how to "take the e-th root" ; even though this function is, again, well defined.  But there is a back door in this trap door function: if we know d, then we DO know how to "take the e-th root": it is simply yet another exponentiation.  The exponentiation to the  power of d renders the exponentiation to the power of e "undone" and is "taking the e-th root".  But without the knowledge of d, we don't know how to do so, computationally.  The back door in the "irreversible" exponentiation is the knowledge of d (which, in its turn, is easily found if we know Phi(n), which, in its turn is easily known if we know p and q).

Discrete logarithm problem: basic idea

Using the discrete logarithm problem to construct directly a "trap door with a back door" is not as straightforward as with the RSA scheme.  In fact, there's one thing that is easily done using the discrete logarithm problem, and that is the establishment of a common secret (ephemeral) key.   It is the use of this common key that is then the basis of the trap door with a back door, using, amongst others, symmetric cryptography.   The establishment of a common secret key is the core of what's called the Diffie-Hellman Key exchange (KHKE) protocol.  Again, we will describe it in much more detail later, but we simply establish its basic idea here.

The idea is that there is a publicly known cyclic group G with a generator a, in which nobody knows how to solve easily the discrete logarithm problem.

The amazingly simple idea is now this: an entity A picks a random element in G: SA, the secret key of A.  Entity A also calculates his public key: PA = aSA.

The trapdoor function of the "public key" is simply exponentiation in the group.  As the inverse problem, taking the logarithm, is supposed to be a hard problem, this illustrates the trapdoor aspect of the function.

How is it used ?

An entity B does the same thing: with SB the secret key of B, and aSB = PB being its public key.

Now, A and B can establish a common key between them, in the following, just as amazingly simple, way:

A knows his own secret key, SA and knows the public key of B: PB . If B takes the public key of B to the exponent of his own secret key, then we have:

KA = (PB)SA = (aSB)SA = aSB.SA

B doing the same, will find KB = (PA)SB = aSB.SA = KA

As such, A and B can establish a common secret key between them, namely K = KA = KB without having the need of a secure channel.

In the Diffie-Hellman Key exchange protocol, N entities have their "fixed" secret keys, and have published their "public keys" openly.  Doing so establishes then N(N-1)/2 secret keys between all pairs of entities, allowing them to communicate secretly between them, establish MACs between them, without any need for sharing secret keys over secure channels.  What is further done with this key is a matter of the application that one wants to obtain.  Of course, the most straightforward application is to use this key as the secret key in a symmetric cryptographic protocol, like AES.  

But one can also use the above method to establish an "ephemeral session key".  This is the basis of the so-called Elgamal encryption scheme.  It is very close to the Diffie-Hellman Key exchange idea, but this time, the sender doesn't use his public key, but just establishes an ephemeral key for the sake of this single communication.

If entity A has published his public key, and entity B wants to send a message to A, then the only thing that B needs, is the public key of A.   B now generates a random secret key S just for the sake of this single communication, and calculates both P = aS and K = (PA)S.  Once this is done, B can forget S, and only keep P and K. 

B encrypts a message C (that has to fit in the group) with X = (C . K) in the group and sends (P, X) to A.

A now receives (P, X).  From P, A can calculate K in the same way as with the Diffie-Hellman key exchange.  A now calculates K-1, the inverse of K in the group (this is part of the group definition, to find an inverse).  Then A can decrypt the message using C = (X . K-1).

If one wants to send several data blocks (because the clear text is larger than the group can fit), then one has to generate a new ephemeral key for each block. 

Digital signatures

An important application domain which is opened up by asymmetric cryptography is the possibility of having digital signatures, who identify an individual entity as being the signer of a document, and which can be verified by anyone.  With symmetric cryptography, this is not possible, as in order to be able to verify the signature, one has to be in possession of the key, and hence, one is also able to produce the signature.  If more than one entity is capable of producing the signature, the signature doesn't prove any more that it has been produced by a single entity.  The property that only one entity can produce the signature, but that many entities can verify this signature, can only be established with techniques of asymmetric cryptography.

We will now show the basic idea behind the digital signature schemes, based upon the two classes of problems that are at the basis of asymmetric cryptography: prime factorisation on one hand, and discrete logarithms in cyclic groups on the other.

Prime factorisation based digital signature

This scheme is also called the RSA digital signature.  It is in fact pretty straightforward. We are in the same settings as with the RSA crypto system. A message C is signed as follows:

s = Cd mod n

The signature is given by (e,s) and the public key by n.  As d is only known to the owner of the private key (p,q) as we saw earlier, only the owner of the private key can produce such a signature.  Without the knowledge of d, it is impossible in practice, to calculate Cd.

However, this signature can easily be verified as follows, with the knowledge of the public key n:

t = se mod n

t should be equal to C mod n for the same reasons as with the decryption.

Discrete logarithm based digital signature

In as much as the RSA digital signature principle was almost trivially simple, the digital signature technique with a discrete logarithm is somewhat more involved.

We will start with a cyclic group G with a generator A, of prime order q.  However, we also need an injective mapping over that group G into the integers, which we call f.  Most of the time, this is not a problem, as the group elements G will have some or other numerical representation, and this representation gives us a natural map from G into the integers. 

We consider that we have a secret key d, which is an integer number.  The public key B, on the other hand, is a group element:

B = Ad

The whole idea is that one cannot calculate d from B, because this is the hard discrete logarithm problem.

We now come to the signature itself, of an arbitrary set of data C.

First of all, an ephemeral signature key K is generated randomly: it is a random number between 1 and q-1.  The group element R = AK is calculated.  Using the map from the group to the integers, we obtain also r = f(R).

Next, we calculate another number, s.

s = (h(C) + d . r) . K-1 mod q

Here, K-1 is the multiplicative inverse of the number K in the modular field q, and h(C) is a given hash function.

The signature of the data set C by the key holder of key d is now (r,s).

The verification by an owner of the public key B is as follows:

  • w = s-1 mod q  (operation in the modular field of modulus q)
  • u = w . h(C) mod q (operation in the modular field of modulus q)
  • v = w . r (operation in the modular field of modulus q)
  • P = Au . Bv (operation in the cyclic group, where we use the signer's public key)
  • we calculate f(P).  If it equals r, then the signature is valid, otherwise, it isn't valid.

The reason is that P should in fact be R.  Indeed, P = Au.Bv = A1/s . h(C). (Ad)1/s . r = A1/s . (h + d.r) = AK = R  when we look at what s is.

One can wonder why this scheme has to be so complicated.  The reason is that we need to obtain several properties.  The first property is of course that the signature can only be generated by the owner of the secret key d.  As such, d has to appear somewhere.  Of course, a second property is that a digest of the data set C has to be secured, so h(C) appearing in there is normal too.  However, the danger is now that we expose the private key.  If we have a few different signatures with the same private key, it could become possible to solve the signatures for the private key.  This is why the private key has to be "protected" with an ephemeral extra key, here K, which should be secret of course.  If K would be known, then it won't protect anything.  There is in fact a potential vulnerability in this scheme, which has happened several times in practice, and which is the re-use or the lack of entropy of the ephemeral key K.  If this happens, then two signatures with the same K are sufficient to expose the private key d.  Indeed, suppose that texts C1 and C2 are both signed using the same ephemeral private key K (which means that r is the same too).

In that case, we have;

s1 = (h(C1) + d . r ) K-1 mod q

s2 = (h(C2) + d . r ) K-1 mod q

In these two equations, there are two unknowns: K and d, and these two equations are not difficult to solve in the modular ring with modulus q, because they are essentially linear in d.K-1  and K-1 !   So two signatures with the same ephemeral key expose the secret key of the signer.  The scheme is secure, however, if we take a new, entropy-rich K for every new signature, but this should really be very carefully taken care of.

Depending on the group G at hand, we call this scheme:

  • the Elgamal digital signature if G is the multiplicative group of a modular field
  • The DSA algorithm if G is a cyclic subgroup of the multiplicative group of a modular
  • The Elliptic Curve DSA algorithm if G is a cyclic subgroup on an elliptic curve

We have of course only outlined the fundamental principle, and the genuine signature protocol is somewhat more involved.

Outlook

Asymmetric cryptography has opened a lot of new applications which are not possible with symmetric cryptography, and we've only touched upon the principles of the most obvious ones.  There are things like ring signatures, and zero knowledge proofs, we didn't even mention here.  However, this doesn't mean that asymmetric cryptography can replace all of symmetric key cryptography.   There are essentially two problems with asymmetric key cryptography which make us still prefer symmetric key cryptography whenever we can use it:

  • the key lengths in asymmetric cryptography have to be way, way bigger to obtain similar levels of security than in symmetric cryptography
  • asymmetric cryptography is extremely computing-intensive for small volumes of "encryption".  We're talking about several orders of magnitude difference in the computational effort needed.

This is why asymmetric cryptography is used only for those essential applications where there's no good solution with symmetric cryptography, and switch to symmetric techniques from the moment that this is possible.