I wondered if this article should be in the cryptography section, or the security section, but in the end, it is a security issue, that could be treated by cryptography to a certain extend.  The problem of the Man in the Middle attack is essentially unsolvable when its extend is too large, which is why one shouldn't go to extreme measures to avoid it, because in the end, it cannot be avoided entirely.

Principle of the Man in the Middle attack

The principle of a Man in the Middle attack (MITMA) is simply the following: suppose that Alice has a communication channel to Bob, or at least that's what she thinks.  In order to have this communication channel, she's using physical and virtual communication infrastructure over which she mostly doesn't have any direct control, nor does Bob.  Her control is limited from her computer to, say, her ADSL modem in which "things happen", and to the piece of telephone wire that leaves her house.  From there on, the communications links are not under her control any more. 

Let us first consider the simplest of situations: a direct cable from Alice's house to Bob's house, but not in her possession, nor in Bob's possession.  When Alice sends messages to Bob, she's pretty sure that what she sends, arrives at Bob's place.  Of course, people can eavesdrop on the cable, but they cannot stop the messages from reaching Bob, can they ?  Well, they could cut the cable, but then, Bob would notice.  Or would he ?

Suppose that half way along the cable, someone has indeed, cut the cable, but he receives Alice's messages on the part of the cable that comes from Alice's house, and he sends messages onto the piece of cable that goes to Bob's house.  Suppose this person installs a computer there, that receives Alice's messages, and copies them over to Bob's wire, and vice versa.  Alice, nor Bob would notice !  In fact, what the telephone company simply did, was installing a repeater !

But suppose now that the "repeater" doesn't copy exactly Alice's messages over to Bob, and doesn't copy exactly Bob's messages over to Alice, but modifies them.  We have now potentially a MITMA.

Consider now a more realistic set up: there's no direct wire from Alice's house to Bob's house.  There's a wire from Alices's house to her phone company C.  Then there's a wire from that phone's company C to backbone operator D.  Next there's a wire within backbone's company from site E to site F.  Next there's a wire from site F of company D to telephone company G. Finally, there's a wire from telephone company G to Bob's house.  At each of these points, there's a machine receiving the messages from the previous machine on the line, and assembling messages to the next hop on the line.  These devices are called routers.  Each of these devices decides to where it sends the data, and how to modify these data.  Each of these devices can hence, in their normal function, do a MITMA.

For instance, the router at E could decide to send all messages it receives that come from Alice and are heading for Bob, not to Bob, but to Mallory, and to forward messages that come from Mallory to Bob, as if they came from Alice and went to Bob.  The same, or another point, say G, that could decide to send all messages that it receives from Bob to Alice, not to Alice, but to Mallory, and to send messages that came from Mallory to Alice, as if they came from Bob.  This is the router equivalent of our cut wire, with this difference that there's no wire to be cut, and that the infrastructure to do so, is already in place, and part of the normal internet functioning.

In other words, if we send out a data packet with a destination address (an IP address), we accept the fact that this data packet is received and read by others, and we take it for granted that this packet will be sent with its data contents unmodified to the destination, but every router on the way could decide otherwise.  In the same way, if we receive a data packet from some sender address, we accept the fact that this packet has been received and sent by others, and we take it for granted that this packet came from this very sender with the data unmodified, but any router along the way could have decided otherwise.

Man in The Middle and Brain in A Box

Consider now the extreme case, where your telephone wire has been cut by your neighbour, and fed into his data centre, and that everything you ever send on the internet, is in fact simply received by him, and that everything you receive is simply send by him.  Suppose your neighbour has a huge data centre, and has set up copies of about everything that's on the internet out there. When you're contacting "Google", in fact, you're simply contacting the copy of Google's web site in his data centre ; when you're writing on "Facebook", you're in fact simply writing on your account on the copy of Facebook in his data centre.  When you are downloading Firefox, you're in fact simply downloading the version of Firefox on his data centre.  And so on.

This is an extreme example, which would require huge resources on the part of your neighbour, and you could wonder how he got a copy of Google and so on... of course, he could contact the real Google in your name for a lot of things and only modify those things for you he's interested in manipulating, but you cannot know how far this goes, as long as your only connection with the world is through your internet connection.

This is somewhat the internet equivalent of the philosophical problem of a "Brain in a box", where an advanced medical society is capable of removing the brain of a person from his body, and put that brain alive in a box, connecting all nerves coming out of the brain to electronic devices that act as if they were the rest of the body.  The brain cannot know it is in a box, but it will think that it has a body and have perceptions that are designed by those having put the brain in the box in the first place.

In the same way, this massive form of MITMA cannot be detected or mitigated as long as one only uses one's internet connection.  This is why fundamentally, the MITMA cannot be provably mitigated in an absolute sense if one has only an internet link.  If someone with enough resources has decided that the first router on your way out is going to send everything that comes from your place to site X, and that everything that is supposed to go to your place, first goes to place X, where X can be any other network place in the world, you won't notice, as long as site X copies everything back to you and back to the outside world, because place X cannot be distinguished from a normal routing operation. X will only modify those things it is interested in doing, and for the rest, act as a normal router. If one thinks that this is science-fiction, this is a practice that certain state-sponsored agencies do put in action.  However, one doesn't necessarily need to be a state-sponsored agency to do such things: a simple router can do so too.

How to mitigate

As we saw, there's nothing that can be done against a "full" internet connection MITMA, as long as one relies only on that internet connection.  But we could use several internet connections.  Of course, if all these connections are also under the control of the same attacker, these multiple connections won't help us.  We can also use non-internet connections.  It becomes harder for an attacker to control, at the same time, communication channels of different kinds.  This is the basis of 2-factor authentication. However, most implementations of 2-factor authentication are seriously flawed and wouldn't mitigate a simple MITMA ; at most, they protect against the theft of a single set of credentials.

We can also rely on a web of trust, or an anchor of trust, but in all these cases, we suppose that the MITMA doesn't involve all of our communication ; as we indicated, a MITMA that involves all of our communication cannot be mitigated.  So the idea is to detect inconsistencies that can only be avoided by the attacker by making the MITMA  so massive that it becomes impractical or too costly.  In other words, mitigations against the MITMA consist in enlarging the necessary scope of the MITMA in order to succeed, and hope that at a certain point, the scope of the attack goes beyond the price the attacker is willing or able to pay.

MITMA and cryptography

The interplay between MITMA and cryptography is subtle.  Given that there is no general mitigation possible against a massive MITMA, one could wonder in how much cryptography can solve an unsolvable problem.  It can't.  But it can help detect inconsistencies in not-too-massive MITMA, that is, against the low-resources "amateur" MITMA.  This is what TLS and its most well-known application, HTTPS, are about, for instance.  But it is important to realize the limitations and the scope of these mitigations.

In the application of cryptography to MITMA, one needs to distinguish two different types of attacks:

  1. Eavesdropping.  Cryptography is very good at mitigating eavesdropping.
  2. MITMA.  Cryptography can only partially help in mitigating this problem, and might give you a false sense of security.

MITMA and symmetric cryptography

Strange as it may seem, symmetric cryptography solves entirely any problem of MITMA if it is applied correctly.  How can this be, because we just claimed that nothing can mitigate a massive enough MITMA ?  The reason is that symmetric cryptography itself has a contradictory requirement contained in its application: Alice and Bob have to agree over a symmetric encryption key over a secure channel, that cannot be eavesdropped, nor faked.  The question then is: why not use this secure channel to transmit the actual message instead of just the secret common key ?  The answer was that this channel may be less performing (small bandwidth), or can be established at an earlier moment in time.  For instance, Alice and Bob may meet in person at a parking lot, and Alice may give a sealed envelope to Bob, containing the secret key.  Later, Bob can open this envelope and use it to encrypt messages to Alice ; Alice can also use this key to encrypt messages to Bob.  Bob and Alice are the only two people in the world being able to encrypt, and to decrypt, messages with this key, so when they read these messages, they know:

  1. that this message comes from Alice or Bob: nobody else is able to encrypt it this way
  2. that this message has not been tampered with
  3. that this message has not been read as long as Alice and Bob were sufficiently vigilant with the message content and the key

In other words, well-working symmetric key cryptography, where the keys have been distributed through secure channels, are perfect protections against any MITMA.  The price to pay is that one needs a secure channel that not only has not been tampered with, but also, that has not been eavesdropped.

MITMA and asymmetric cryptography

Asymmetric cryptography has been hailed as the solution to the "ridiculous" problem of symmetric cryptography, which was that in order to communicate securely over an insecure channel, one needed to communicate the key over a secure channel.  The great advantage of having a key pair: a secret key and a public key, was that the secret key never needed to be communicated, and that the public key could be communicated over a secure channel.

This solves the problem against communicating the key over a channel that is insecure in the sense that it could be eavesdropped ; but it doesn't solve the problem against a channel that could be suffering from a MITMA.

So let us consider again the scenario where Alice wants to communicate securely with Bob.  Bob and Alice decide that they want to be able to communicate securely in the future.  With symmetric cryptography, they had to meet on a parking lot, and they had to make sure that the key in the envelope would never be revealed ; this is why Alice put it in an envelope in the first place.  With asymmetric cryptography, Alice generates a public/secret key pair at home, and Bob does so too.  Now, they meet at the railway station, and they exchange their public keys in the open.  Even if someone is looking at these keys, no harm is done.  If Alice wants to write a message to Bob, she has to do two things:

  1. she encrypts the message with Bob's public key she got from him at the railway station.  However, an eavesdropper at the station could do this too (he has a copy of Bob's public key).
  2. she signs the message with her secret key.  This is something that the eavesdropper cannot do.

When Bob receives Alice's message, he is the only one that can decrypt it, because he needs his secret key to do so and this key never left his home.  He can also be sure that this message was signed by Alice, because he uses Alice's public key to check the signature and he knows that this is Alice's public key.

As such, asymmetric cryptography has allowed Alice and Bob to communicate securely, because Bob was sure he had Alice's public key, even if these public keys were eavesdropped ; simply because Bob got the key from Alice directly.

However, where did things go wrong with asymmetric cryptography ?

Suppose that Alice and Bob don't want to go through the hassle of meeting at the railway station (suppose that Alice lives in Australia, and Bob lives in Sweden).  Instead of meeting in person, they decide to send their public keys by paper mail.  Alice puts her print-out of her public key in an envelope, puts a stamp on it, and sends it by mail to Bob.  Bob does the same for Alice.

Now, suppose that someone in the postal office near Bob can open the envelopes.  We call her Mallory.  When the envelope from Australia arrives at the postal office, she opens the envelope, and replaces Alice's public key by the public key Mallory_1, and brings it to Bob.  Bob will think that this is Alice's key.  When Bob sends off his letter with his public key, Mallory opens the envelope and replaces it with the public key Mallory_2, and sends it off to Alice.

This time, when Alice wants to send a message to Bob, she encrypts it with Mallory_2, thinking this is Bob's public key.  If ever this message arrives directly at Bob's, Bob will notice: he cannot decrypt it !  However, if Mallory does a MITMA, she will receive Alice's message, and she can decrypt it, because Mallory has the private key that goes with Mallory_2.  She can now read it, and even modify it.  Next, she can encrypt it with Bob's public key, and she will sign with the secret key that goes with Mallory_1.  She sends the modified, re-encrypted, and re-signed message to Bob.

When Bob receives this modified message, he will be able to decrypt it, because it has been encrypted with his public key.  When he checks the signature with the public key Mallory_1, which he thinks, was Alice's, the signature will be OK.  Bob has hence received a modified message from Alice, which he thinks has not been read, and is coming from Alice, while nothing could be further from the truth.  When Bob replies to Alice, exactly the same thing will happen in the other direction.  Worse. Suppose that Bob detects Mallory's actions, and can make such that she stops.  If he tries to explain this to Alice, Alice will not believe him, because Bob doesn't possess the private key to sign the message that goes with the public key Alice thinks, is his (it is in fact, Mallory_2).  If Bob encrypts the message to Alice with Mallory_1 (which he thinks, is Alice's public key) and signs it with his true private key, Alice will receive a message she cannot decrypt with her true private key, and the signature will not fit.  So Alice will think this true message from Bob came from an imposter.  Bob can never again, communicate with Alice, until they meet in person.

As such, public key cryptography doesn't protect against a MITMA if the public key exchange can be victim of the same MITMA.  Worse, once a public key exchange has been victim of such a MITMA, a legitimate  communication will be seen as fraudulent.

Public key cryptography has displaced the problem of eavesdropping on a communication channel, into a problem of the authenticity of public keys.

If all the public keys you possess of other entities, come from untrustworthy sources, then you are entirely vulnerable to your attacker - worse, in those circumstances where you are not a victim of a MITMA, you will think you are !  However, this time you don't care at all about any form of eavesdropping.

In a certain way, this is unavoidable.  Given that public keys can be communicated over an internet connection, and given that we know that a fully massive MITMA cannot be mitigated, it is normal that public key cryptography cannot mitigate an attack that cannot be mitigated, as this would be a logical contradiction. 

Partial mitigations: web of trust

As we've seen, there's no full cryptographic solution to the problem of a massive full MITMA.  However, as pointed out earlier, one can do something about "small-scale" MITMA.  In all these cases, the idea is that some communication escapes the MITMA, and that one can use this in order to detect inconsistencies that reveal a partial MITMA.

We saw how Alice and Bob could solve their problem in a satisfactory way: meet in person, and exchange their public keys.  This meeting was an anchor of trust.  Bob knows now for sure that Alice's key is really Alice's.  Alice knows now for sure that Bob's key is really Bob's.  Of course, if they don't know one another physically, Bob may have met Mallory at the railway station, thinking it was Alice, and Alice may have met Joe, Mallory's boyfriend, thinking it was Bob, but this is logistically harder to put in place.  This simply illustrates the general thesis that if the MITMA includes the key exchange, nothing can be done against it.  We need a certain "anchor of trust" where we assume that we were actually in contact with the right person, and got the right public key.

Suppose now that Alice also knows Christine and David, and that Bob knows Elisabeth and Fred.  Suppose that Alice has personally exchanged keys with Christine and David, and that Bob has exchanged personally keys with Elisabeth and Fred.  Is there a way in which Alice could, say, obtain in a secure way, Fred's public key ?  One way would be that she asks Bob to send her Fred's key in an encrypted and signed way.  That would be a way.  If Christine would now need Fred's key, she would need to obtain that key from Alice.  This is possible, but very cumbersome.  Would there be another way ?

A way would be that Bob signs Fred's key, and makes public this signature.  If Alice signs Bob's key, and makes this signature also, then, Christine can verify the veracity of Fred's key in the following way:

  1. Christine trusts the public key she has from Alice, because she got it directly from her: it is her anchor of trust
  2. Christine can now also trust that Bob's key is really Bob's, because Christine can verify that Alice genuinely signed it.  She can verify this signature with Alice's public key. 
  3. Christine can now also trust that Fred's key is really Fred's, because Bob signed it, and she can now also trust Bob's public key.

Christine doesn't have to bother Alice or Bob, once these have deposited their signatures of the keys.  But can this go wrong ?  Yes. Suppose that Bob, who knows Fred very well, has signed a public key he thinks, belongs to Fred, but actually belongs to an imposter.  Christine will now think that she has Fred's correct public key, and the imposter can pretend to be Fred ; the real Fred would now even be in the impossibility to inform Christine because she wouldn't accept his true signature.    However, if several people that Christine can trust, did sign Fred's right key, it would become difficult for the imposter to get so many signatures on his fake public key for Fred.  If Christine can "reach" a given public key through different ways, then this gives her more trust that this key is really truly belonging to this person.

This way of doing so, is called a web of trust.  PGP keys for e-mail are used this way.  It is important to remember to only sign these keys if you've verified directly that those keys do belong to the person they claim they belong to.

There can be public repositories of public keys.  The fact that a key is on such a repository is, in itself, not a guarantee that the key is belonging to whom it claims to belong.  However, if these keys are also signed by others, they start to form a web of trust.  If your anchor of trust belongs to the web of trust, you can trust all those entries that can be reached from that point, and the more diverse the pathways are that can reach the same point, the higher the trust is (the less you can accept a fraudulent key signed by inadvertence or mischief).

Partial mitigation: certificates

Another approach to the mitigation of MITMA is the use of certificates, and certificate authorities.  This is the approach taken by TLS (Transport Layer Security) which is the implementation of most of the "secure" protocols on the internet, such as https.

The idea is to have a few (not too many) central figures that take the burden on them to verify the exactness of each of the public keys they sign.  In a certain way, they are the Alices and the Bobs that "go out and meet the people at the railway station in person" to get their public keys.  If Alice goes out and meets everyone in Australia that wants to have a secure public key, and if Bob goes out and meets everyone in Sweden that wants to have a secure public key.  After verifying their identity, they sign off these people's public keys.  The combination of these public keys, a claim of identity, and Alice's or Bob's signatures, are called certificates.  Now all the people in the world only need to be sure to have Alice's correct public key and Bob's correct public key, and they can now verify everybody's certificate in Sweden, and everybody's certificate in Australia.  In fact, the authors of important internet communication software can even include Alice's and Bob's keys in the software itself.  If ever someone presents a certificate, with the claim that this certificate was issued by Bob, then the software will verify whether the signature agrees with Bob's key (included in the software).  Commercial entities that propose their service of verifying identities and signing off certificates, are called Certificate Authorities, and they want their keys of course included in the important internet communication software (read: browsers).  They usually charge a fee for signing off a key.

The question is of course: how much can we trust these entities ?  If they sign off certificates for imposters, then everybody accepting their authority will have to accept the "validity" of these false certificates. 

2-factor and MITMA

As we saw, the only hope to defend against a MITMA is to dispose of a channel the attacker doesn't master.  With the advent of smart-phones, this second way became obviously the phone.  Some people have criticized that the wireless phone has also its weaknesses, but nevertheless, needing at once to have a MITMA on the internet link and on the smart phone enlarges seriously the scope of the needed attack to be successful, which is exactly what we aimed for in a defence against a MITMA.

However, the usual way 2-factor authentication is implemented, nullifies the potential defence against a MITMA.  Indeed, suppose that Mallory is happily installed on the internet link between Alice and Bob, like we described before. Suppose that Mallory doesn't intercept Bob or Alice's phone, she's an internet-only MITM attacker.  This is the kind of attack that 2-factor authentication should catch. Now, the two classical ways in which 2-factor authentication is implemented, are as follows:

  1. Bob sends a code to Alice's phone over SMS
  2. Bob asks Alice to send him a one-time password (for instance, with Google Authenticator)

In both cases, however, Alice will send the "proof of her challenge" over her compromised internet connection !  As Mallory can read Alice's answer to Bob's challenge, she can transmit it to Bob.  Mallory's attack goes undetected !

The only way to use the second channel usefully, is to use it to check the validity of the public keys.  Ideally, Bob would encrypt a random number with what he thinks, is Alice's public key, send it over the phone to Alice, and ask Alice to decrypt it, add another random number, and encrypt it with what she thinks is Bob's public key to send it back to him, over the phone.  However, this is impractical because these messages are too long to go over SMS.  Apart however, from such a scheme, the difficulty in 2-factor authentication resides in the multiple simultaneous demands of 2-factor authentication:

  1. Verifying identity. 
  2. Verifying the validity of public keys.
  3. Short message (SMS)

In its simplest format, Bob could send a digest of what he thinks is Alice's public key.  In the case of a MITMA, Bob will send the digest of Mallory_1, and Alice will notice that this is NOT the digest of her public key: the MITMA is exposed !  However, how to tell it to Bob ?  Bob could send the following message to Alice, by SMS:

"Verify if xxx.xxx is the digest of your public key.  If it is, send back yyy.yyy over the internet, if it isn't, send back zzz.zzz over the internet.",

where yyy.yyy is a random number, and zzz.zzz is another random number.  Alice will now type in: zzz.zzz.  Mallory, being in the middle, can see this of course, but what can she do ?  In order to convince Bob, she has to send yyy.yyy, but she doesn't know yyy.yyy.  So Mallory can only just send zzz.zzz, send something else, or send nothing.  In any case, Bob will never receive yyy.yyy and hence knows there's a MITMA going on, like Alice, or that Alice is a joker, or that Alice's secret key has been compromised.  In all these cases, he can ditch the public key (Mallory_1) he thought was belonging to Alice. 

One could make that symmetrical: Alice could now send a similar message to Bob:

"Verify if XXX.XXX is the digest of your public key.  If it is, send back YYY.YYY over the internet, if it isn't, send back ZZZ.ZZZ over the internet".

Note that in this case, there's no need for any certification authority or signatures of the public key.  But now, the question is: how to know the right phone number... It is obvious that if the phone number was obtained by the same way as the public key, all this 2-factor authentication is moot, because the phone number will be Mallory's.  In fact, 2-factor authentication has displaced the problem from "knowing the owner of the public key" into "knowing the owner of the public key and the phone number".  After all, how does Alice even know that she activated 2-factor authentication for real with Bob's site ?  If she enabled 2-factor authentication while being a victim of a MITMA, she sent her phone number and her request to Mallory.  Mallory may just as well not have activated this feature on Bob's web site, and it might very well be Mallory who has sent this message to Alice, with this time, of course, xxx.xxx being the true digest of Alice's public key.  Alice has sent the answer to Mallory, who didn't do anything with it, but it gave a false sense of security to Alice.  Even in the case where Bob's site requires 2-factor authentication, Mallory can send her phone number to Bob's place, so she, and not Alice, will receive the SMS, and she will be able to reply yyy.yyy to Bob, giving Bob also a false sense of security.  We see again that if the MITMA covers the setting up of the "secure identity verification", it goes undetected, even if Mallory doesn't have to hack Bob or Alice's phone: they simply have the wrong phone numbers, like they have the wrong public keys.

Even if Alice looks for Bob's phone numbers on the internet (for instance, on Bob's web site), Mallory can modify them and display her own phone numbers on "Bob's web site" as she reconstructs them, for Alice. 

Conclusion

There can only be partial mitigation against a limited MITMA.  The essence of these mitigations is to obtain an "anchor of trust", that is to say, to have the relative certainty of the correct ownership of certain public keys, and to build upon these anchors of trust, trusting the entities that they only signed other public keys if they trusted themselves that these other public keys displayed the correct ownership.  If this is done in a decentralized way (like with PGP), then one builds a "web of trust".  The more entangled and versatile this web is, the more one can trust it (or the bigger the MITMA/conspiracy has to be to fake it).  If this is done in a more centralized way, one accepts to trust a set of "Certificate Authorities" which issue certificates (signed public keys with their ownership claim).  A set of Certificate Authorities can only be trusted as much as the least trustworthy of them can be trusted. 

With 2-factor authentication, there is a way to find out a MITMA if the second channel is not compromised by the same attacker, but most implementations of 2-factor authentication do it wrong.  The difficulty still resides in knowing one has the right phone numbers.