Arithmetic revisited

Introduction

Positive integer numbers can represent all data, all information a human can possess, and we've seen that positive integers can be manipulated, which will be very important for applications like cryptography. We need finite sets of integer numbers if we want to be able to deal with finite amounts of information and we've seen that it is possible to construct rings and even fields with a finite number of integer numbers. At the core lie operations we're familiar with: addition, subtraction, multiplication, long division. Most of us have learned to do these manipulations by hand when in primary school, but 'professionally' we use of course a computer or a calculator. Addition, multiplication, .... we don't think about that, it is 'available' by the machine. It is often even available on hardware level: even in assembler, we don't have to spell out to the processor how to do a multiplication of two numbers, it is just one single instruction.

And now comes the difficulty and maybe the surprise: the hardware has been made for a finite set of integers which is way too small for our applications. On all machines, 32-bit integers are present in hardware, and in many cases, there is hardware, or at least system software support, for 64-bit integers. Now, a 64 bit integer (usually a signed integer) has a positive range of 9 223 372 036 854 775 807. For most applications where integers are used to count things, that's large enough, and people won't notice the fact that the computer actually implements a finite ring instead of the infinite ring of integers Z. 32-bit integers have a smaller range, namely about up to two billion, and this poses a problem for the bank account of very rich people, but 64-bit integers are more than OK, even for the likes of Bill Gates. This is why the implemented finite ring of integers with 64 bits is more than good enough for most standard applications where people need to do arithmetic. No school teacher ever got it in his mind to torture school kids with manual operations where there were more digits than in the above mentioned number (I hope).

However, for the finite fields and rings needed when integers represent data, this scope is much, much too small. Several cryptographic systems need numbers with of the order of 3000 bits. In fact, there are no generally used cryptographic systems which can deal with numbers of only 64 bits, apart from one, which is simple DES, and which is, for exactly that reason, not secure any more. When we are going to use numbers to represent data, instead of counting stuff, in fact we go to very, very big numbers. Most finite rings we will be using will have more elements in them, than that there are elementary particles estimated in the visible universe. This is why these rings are usually not needed when simply counting stuff: we rarely need to count all the elementary particles in the whole universe.

So we will have to implement big numbers "by hand" in software. Of course there are very good libraries out there, but in order to master a subject, it is always a good idea to try an implementation one-self. This is the aim of this contribution: to implement arbitrary-length arithmetic in finite fields and rings on a computer.

The code written is functional, but has only as an aim to illustrate the algorithms, so we avoided as much as we could, all computer science matters that make the code more useful but the algorithms less visible. This is why we limited ourselves to simple C code. Of course, there are some quirks due to C (such as the fact that we have to pass parameters by reference as pointers), but we tried to limit that to the strict necessity to make the code run.

Number representation and numeral systems.

Numbers are abstract mathematical concepts, and if we want to write them down, use them in a computer, or manipulate them in any way, we need to represent them. As we have limited ourselves to finite rings and fields, at first sight, it is simple: we can represent each element of the finite field with a different symbol. However, that becomes very impractical if we have sets with several thousand or more entities, as the convention of these symbols would start to overwhelm us. As the rings we're interested in can easily contain more elements than there are elementary particles in the universe, we wouldn't be able to write down a dictionary of our symbols, even if every single entry in the dictionary only needed one single elementary particle and we were able to put all the matter of all galaxies into the dictionary. So individual symbols are out of the question. So we need a smarter system.

Our day-to-day decimal system is based upon the Hindu-Arabic numeral system which can relatively easily represent such big numbers. Its principle is the repeated use of a base set of numbers which is relatively small, and make n-tuples of those symbols. Our decimal system uses a base set with 10 base numbers: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. They have individual symbols and we call them digits. A number is then represented by an n-tuple of those elements: [ 7, 3, 0, 5], where each position has a weight which is equal to successive powers of the number of elements in the base set, in our decimal system, 10. So 5 has weight 1, 0 has weight 10, 3 has weight 100 and 7 has weight 1000. We represent the number 7305 with 4 digits. The power of the Hindu-Arabic system is given by the very fact that the scope of represented numbers rises exponentially with the number of places in the n-tuple, or to say it differently, that the complexity of the representation only rises logarithmically with the size of the numbers to be represented. The tradition wants us to write the least significant digit (lsd), that is, the one with the smallest weight - usually 1 - to the right and the most significant digit (msd), that is, the one with the largest weight, to the left.

When "counting upwards" in such a numeral system, we change the lsd into the next one in the base set (so if it is 6, we change it into 7). When we reach the largest one (9) we change it into the smallest one, and we carry over this event to the next higher digit (if that one was a 4, we change it into a 5) ; if that one in its turn is the highest one, we have to carry that event over again to the next higher one, until this process stops. It has to stop, because at a certain point, we will introduce a new element in the tuple: the n-tuple becomes an n+1 tuple.

If we have [9,8,8], then the next one is [9,8,9]. And the next one, because the lsd is 9 and has to change into 0, causes an increase in the next digit, which goes from 8 to 9: [9,9,0]. If we count further, we will arrive at [9,9,9], and the next one will change the lsd from 9 into 0, which carries over: the next 9 becomes 0 too, which carries over, and the third 9 becomes 0 too, and we need a new digit in the tuple: our 3-tuple has become a 4-tuple: [1,0,0,0], because the 4-th element was tacitly assumed to be 0 before "existing", and the carry-over changed it into 1. This, you knew since first grade of course. There are in fact many things from first grade which we will have to recall. The point is that if we continue counting in a Hindu-Arabic numeral system, the number of digits in the tuple, the size of the tuple, will increase indefinitely. Logarithmically, true, but still without bound. That's normal, because the numeral system is supposed to be able to represent potentially all natural numbers in N, with infinite scope.

However, our point is that we will work in (very large) cyclic rings or fields. As such, there is a specific number which is "set equivalent" to 0. We will call this number the modulus. As the modulus is a number like any other, we can represent it in the Hindu-Arabic system with a given n-tuple, say, [8,5,7], we have to change our rule somewhat when counting up: if we have [8,5,6] and we count up, we would reach [8,5,7], but we now replace it with [0,0,0]. This is how the modular ring or field is then represented. The advantage is that in a modular ring, we only need n-tuples of a given size (the size of the representation of the modulus).

We have used the classical decimal representation here, but the Hindu-Arabic system can be used with different base sets, as is well known. A base set with 10 elements gives rise to the standard decimal system. A base set with two elements {0,1} is the binary number system. A base set with 16 elements {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is the hexadecimal system. Both are used a lot in computer science. We can use any base set we like.

We want to remind: Code examples

#include <stdio.h>

#define basis 10
#define nofdigits 4
#define printlevel 0

/* "number" is a structure with nofdigits digits, the normal number in the
   ring structure we consider ; number1 has one more digit to allow for the 
   overflow when multiplying a number with a single digit and other intermediate
   results ; bignum has twice as much digits, to allow for the product of two numbers
   before the modulo operation ; we also need bignum1 which has one single extra digit
   more than bignum for overflow results when working with the product */
typedef struct digits {unsigned int digit[nofdigits];} number;
typedef struct digitus {unsigned int digit[nofdigits + 1];} number1;
typedef struct digitis {unsigned int digit[nofdigits * 2];} bignum;
typedef struct digitas {unsigned int digit[nofdigits*2+1];} bignum1;

In this piece of C-code, we set up the n-tuples which will represent numbers in the Hindu-Arabic system, and the number of elements in the tuple is fixed to nofdigits which is 4 in this case. We also fix the basis. We put it to 10, but we can increase it to the machine limit, that is, we can use a base set with up to about 2 billion symbols (the symbols are themselves numbers in the machine number representation here, namely C-type integers).

A technicality is that we resorted to a structure which contains an array, in order for the array to be "inside" the number.

So if we will declare a number:

number mynumber;

we will access its digits as follows:

mynumber.digit[0]

which will be the lsd, or

mynumber.digit[3]

which will be the 4-th digit and in our example, the msd.

The "printlevel" is nothing else but a flag that tells the program how verbose it should be in doing its operations.

In the code comments, it is explained why we need larger tuples for intermediate results ; these give rise to the auxiliary number types with n+1, 2n, and 2n+1 elements in the tuple, respectively. We will see why we need those.

number modulo = {{7,5,8}};  // warning: lsd first !

This line defines the modulus which will be used for the cyclic ring. Note that the C language doesn't allow for the Arabic convention to write from right to left, which is why {7,5,8} represents the number 857 in standard notation. (in VHDL, this would have been possible).

We have two auxiliary functions which can read a number from standard input, and which can write a number to standard output:

/* ----------------------------------------------- */

int readnum(number* x)
  /* This function reads in the digits of a number from stdin */
  /* The reading is from lsb to msb                           */
  {
  printf("Read digits : \n");
  int i;
  int mydigit;
  
  for (i = 0; i < nofdigits ; i++)
    {
    scanf("%d",&mydigit);
    if (mydigit < 0)
      {
      printf("negative digit ! \n");
      return 1;
      }
    if (mydigit >= basis)
      {
      printf("digit out of range ! \n");
      return 1;
      }
    x->digit[i] = mydigit;
    }
  return 0;
  }  // end readnum

/* ----------------------------------------------- */

void printnum(number* x)
  /* this function prints out a number on stdout */
  {
  int i;
  for (i = nofdigits - 1 ; i >= 0 ; i--)
    {
    printf("%d ",x->digit[i]);
    }
  printf("\n");
  } // end printnum

/* ----------------------------------------------- */

After all these technicalities, we can attack the main course: the actual arithmetic algorithms.

Note that we will cheat somewhat: we will use the computer arithmetic to implement the single-digit arithmetic. As our individual digits are represented by computer integers (in C, we can suppose that these are 32 bit integers, although the standard guarantees only 16 bits, which is still large enough for most practical bases), we will use computer-implemented operations to find, say, the sum of digit a and digit b, by the operation a + b. We will then need to do extra operations to apply carry over operation to the next digit, but the "a+b" will be done by the hardware of the computer. In as much as this is considered cheating, one should in fact use lookup tables of size n x n which give, for each couple of digits, the digit that results in the sum, and the digit that has to be carried over.

One should then provide similar tables for subtraction (with "borrow") and multiplication (with carry). With every Hindu-Arabic system come these "lookup tables" which can sometimes be deduced from more elementary considerations (like counting), which are not practical to repeat each time we need them, which is why we learn the lookup tables.

In fact, learning your "tables of multiplication" in primary school is nothing else but getting those tables into our childs heads... So what we actually do when using the computer build-in operations for addition, subtraction and multiplication (and in one instance, division as we will see), is using them as a short-hand to the basis lookup tables which we could have constructed too.

Subtraction

It can seem strange, but subtraction is easier to implement as addition.

Here we go:

 void difnum(number* x, number* y, number* s)
  /* under the condition that x and y are numbers within
     the modulo ring, s is the difference of x - y in the
     modulo ring */
  {
  int i;
  int borrow;
  int dd;
  int ss;
  int carry;
  borrow = 0;
  // calculation of the difference using borrowing
  // but not taking into account the modular structure
  if (printlevel > 0)
    printf("Difference without modular structure: \n");
  for (i = 0 ; i < nofdigits ; i++)
    {
    dd = x->digit[i] - y->digit[i] - borrow;
    if (printlevel > 0)
      printf("i is %d and dd is %d ",i,dd);
    if (dd < 0)
      {
      dd += basis;
      borrow = 1;
      }
    else
      {
      borrow = 0;
      }
    s->digit[i] = dd;
    if (printlevel > 0)
      printf(" --> borrow is %d and finally digit is %d \n",borrow,dd);
    }
  // in the case of a "negative" result, 
  // we add one time the modulo size.
  if (borrow != 0)
    {
    if (printlevel > 0)
      printf("Negative ! \n");
    carry = 0;
    for (i = 0; i < nofdigits ; i++)
      {
      ss = s->digit[i] + modulo.digit[i] + carry;
      carry = ss / basis;
      s->digit[i] = ss % basis;
      if (printlevel > 0)
        printf("  -> ss is %d, carry is %d and digit is %d \n",ss,carry,s->digit[i]);
      }
    }
  }  // end difnum

There is a small technicality here: we have to pass parameters in C by reference, so we pass pointers to numbers. If we use pointers, we can dereference them to their content with an arrow instead of with a point. This is why we write

x->digit[i]

instead of

x.digit[i]

because x is here a pointer to a number structure and not a number itself (which would invite the notation with the dot).

The important mechanism is the subtraction, digit by digit. We use the fact that the computer representation allows us to represent negative numbers (extra symbols in fact) which are normally not present in the base table of digits. This signals that one has to "borrow" one unit from the next higher digit. One such borrowing is always sufficient to bring back the negative digit in the positive realm, so the borrowing is never more than one single unit.

We treat from the lsd to the msd. As such, when we treat a digit, the borrowing "from below" is already established and we have to take it into account. But even with subtraction digit by digit, and eventually one unit of borrowing, this can always be brought back to a normal digit by adding one basis size (which is what the borrowing does for the lower digit).

There is however one caveat: it might be that we have to borrow when we are already at the msd, and there are no digits left any more to borrow from. In the natural numbers, that would signify that we have a negative number. It happens when we subtract a bigger number from a smaller one. In our modular structure however, this simply means that we have to add one time the modulus value.

The addition will be the same as we will explain further on, except that, if we had a negative number to start with, the result of the addition will not overflow, which is the main problem with addition.

So, if we end up having a "borrow" left when arriving at the msd, we proceed now by holding this borrow, and add the modulus, using a carry this time (see next section on addition). At the end, there will be a carry at the msd that will compensate the borrow.

Addition

In order for addition to be implemented, we need to have a way to compare numbers. Indeed, addition can result in a number, larger than the modulus, but in order to find out, we have to be able to compare the result to the modulus.

/* ----------------------------------------------- */

int isbigger(number* x, number* y)
  /* indicates whether a number x is larger than the number 
     y in the modulo field (this is an Euclid function) */
  {
  int bibi;
  int i;
  bibi = 0;
  for (i = nofdigits - 1 ; i >= 0 ; i--)
    {
    if (x->digit[i] > y->digit[i])
      {
      bibi = 1;
      break;
      }
    else if (x->digit[i] < y->digit[i])
      {
      bibi = -1;
      break;
      }
    }
  return bibi;
  }  // end isbigger

/* ----------------------------------------------- */

int isbigger1(number1* x, number1* y)
  /* indicates whether a number x is larger than the number 
     y with extended length */
  {
  int bibi;
  int i;
  bibi = 0;
  for (i = nofdigits ; i >= 0 ; i--)
    {
    if (x->digit[i] > y->digit[i])
      {
      bibi = 1;
      break;
      }
    else if (x->digit[i] < y->digit[i])
      {
      bibi = -1;
      break;
      }
    }
  return bibi;
  }  // end isbigger

/* ----------------------------------------------- */

In the code we implemented in fact two versions of the comparison function, one for numbers with n digits, and one for numbers with n+1 digits which will be used in other functions. The function returns 1 if x is strictly larger than y, it returns 0 if x = y, and it returns -1 if x < y. With this comparison function, we can implement the addition:

/* ----------------------------------------------- */

void sumnum(number* x, number* y, number* s)
  /* under the assumption that x and y are within the modulo 
     ring, s is the sum x + y in this ring */
  {
  int i;
  int carry;
  int ss;
  int borrow;
  int dd;
  number sups;
  carry = 0;
  /* calculate sum without modulo structure */
  if (printlevel > 0)
    printf("Sum without modular structure \n");
  for (i = 0 ; i < nofdigits ; i++)
    {
    ss = x->digit[i] + y->digit[i] + carry;
    carry = ss / basis;
    sups.digit[i] = ss % basis;
    if (printlevel > 0)
      printf("i is %d, ss is %d, carry is %d and digit is %d \n",i,ss,carry,sups.digit[i]);
    }
  /* if "overflow" we subtract one time the modulo */
  if (carry != 0)
    {
    borrow = 0;
    if (printlevel > 0)
      printf("Overflow ! \n");
    for (i = 0 ; i < nofdigits ; i++)
      {
      dd = sups.digit[i] - modulo.digit[i] - borrow;
      if (printlevel > 0)
        printf("i is %d, dd is %d ",i,dd);
      if (dd < 0)
        {
        dd += basis;
        borrow = 1;
        }
      else
        {
        borrow = 0;
        }
      if (printlevel > 0)
        printf(" -> after correction: dd is %d and borrow is %d \n",dd,borrow);
      sups.digit[i] = dd;
      }
    if (borrow != carry)
      {
      printf("Internal inconsistency: Dikken ambras !! \n");
      }
    }
  /* if the number is bigger than the modulo we subtract modulo.
     note that the subtraction fills in the output ;
     if the subtraction is not needed, we need to copy 
     the result into the output */
  int big;
  big = isbigger(&sups,&modulo);
  if (big >= 0)
    {
    difnum(&sups,&modulo,s);
    }
  else
    {
    for (i = 0 ; i < nofdigits ; i++)
      {
      s->digit[i] = sups.digit[i];
      }
    }
  }  // end sumnum

/* ----------------------------------------------- */

The addition code does the following. In the first part, digit by digit addition is applied, and we take advantage of the fact that the computer representation of digits can handle "digits" outside of the basis range. The remainder by division by the basis is the digit that remains in the result, and the quotient of the long division is the carry that has to be added to the next higher digit. When we do this from the lowest significant digit to the highest digit, we have implemented the normal addition in the Hindu-Arabic representation.

There are two problems with this. The first one is that we can still have a carry when we have added the highest digits. This means that we should use a number with one more digit, and we don't want to. If that is the case, it means that we are in any case beyond the modulus, and that in the modular ring, we have to subtract once this modulus. This is implemented with an algorithm which is almost a literal copy of the subtraction algorithm discussed earlier. In the end, the "borrow" of this algorithm has to compensate the "carry" of the addition (in fact, this carry takes the function of the extra digit that we didn't want to implement explicitly).

However, it can be that the sum still fits within the number of digits, but that it is larger than the modulus. In that case, we have also to subtract the modulus, but this time, we can do this with the earlier defined function of subtraction, as there is no "hidden carry" that functions as an extra digit.

With a simple sum, at most one of both actions is possible, as the sum of two numbers in the ring can never be larger than twice the modulus.

Multiplication

In order to be able to apply multiplication in the cyclic ring, we have to have two extra tools, because we will have to reduce the normal multiplication to the ring by taking the remainder when dividing by the modulus.

The first tool we need is the multiplication of a number with a single digit without any modular structure.

/* ----------------------------------------------- */

void digittimesnumber(int dig,number* x,number1* s)
/* multiplication of a single digit with a number */
  {
  int i;
  int j;
  int carry;
  int ss;
  carry = 0;
  if (printlevel > 0)
    printf("Digit times number; digit is %d \n",dig);
  for (i = 0 ; i < nofdigits ; i++)
    {
    ss = dig * x->digit[i] + carry;
    s->digit[i] = ss % basis;
    carry = ss / basis;
    if (printlevel > 0)
      printf("i is %d, ss is %d, carry is %d and digit is %d \n",i,ss,carry,s->digit[i]);
    }
  s->digit[nofdigits] = carry;
  }

/* ----------------------------------------------- */

Note that the result is a number with one more digit than we are used to. This is the simple multiplication in the Hindu-Arabic system where we make use of the implementation of the individual digit multiplication and long division. The new digit will be the last carry.

Next we need the long division, or at least, the remainder, when dividing by the modulus. As we are going to apply this to a product, we take at the entry a number which has twice as much digits than original, as the normal product in the Hindu-Arabic system will occupy up to twice as much digits:

/* ----------------------------------------------- */

void bigmodulo(bignum* x,number* r)
  {
  int i;
  int k;
  int km;
  int ll;
  bignum1 rr;
  number1 high;
  number1 testhigh;
  /* We copy the dividend into a working number */
  for (i = 2*nofdigits - 1; i >= 0 ; i--)
    {
    rr.digit[i] = x->digit[i];
    }
  rr.digit[2*nofdigits] = 0;
  /* we will now apply long division, digit by digit */
  for (i = 2*nofdigits - 1 ; i >= nofdigits - 1 ; i--)
    {
    /* we estimate a lower boundary on the digit of the quotient */
    km = rr.digit[i+1]*basis + rr.digit[i];
    ll = km / (modulo.digit[nofdigits - 1] + 1);
    /* we copy the relevant part of the dividend */
    /* we have to copy nofdigits + 1 digits      */
    if (printlevel > 0)
      printf("i is %d . Estimated q: %d ; Relevant part of dividend : \n",i,ll);
    for (k = i + 1 ; k >= i - nofdigits + 1 ; k--)
      {
      high.digit[k - i + nofdigits - 1] = rr.digit[k];
      if (printlevel > 0)      
        printf("digit %d is %d \n",k,rr.digit[k]);
      }
    /* we have to find out how many times to subtract "modulo" 
       before this number is bigger than the relevant part of the 
       dividend */
    do
      {
      digittimesnumber(ll,&modulo,&testhigh);
      ll++;
      }
    while (isbigger1(&high,&testhigh) >= 0);
    ll--;
    ll--;
    if (printlevel > 0)
      printf("Quotient digit number %d is %d \n",i,ll);
    /* we will now do the subtraction */
    digittimesnumber(ll,&modulo,&testhigh);
    int borrow;
    int dd;
    borrow = 0;
    for (k = 0 ; k <= nofdigits ; k++)
      {
      dd = high.digit[k] - testhigh.digit[k] - borrow;
      if (dd < 0)
        {
        dd = dd + basis;
        borrow = 1;
        }
      else
        {
        borrow = 0;
        }
      high.digit[k] = dd;
      }
    if (high.digit[nofdigits] !=0 || borrow != 0)
      {
      printf("Internal inconsistency: Vried ambras ! \n");
      }
    /* we update the intermediate result */
    for (k = i + 1; k >= i - nofdigits + 1; k--)
      {
      rr.digit[k] = high.digit[k - i + nofdigits - 1];
      }
    }
  /* we copy the remainder into the result */
  for (i = nofdigits - 1 ; i >= 0 ; i--)
    {
    r->digit[i] = rr.digit[i];
    }
  }  // end bigmodulo

/* ----------------------------------------------- */

The general idea is the following. We copy the dividend into a working register which has 2n+1 digits, of which the highest digit is hence 0. This register will be reduced with multiples of the modulus, until only the remainder remains.

We apply the long division scheme from our school time: we look at the n+1 highest digits (the divider has n digits). It is because of this n+1 that we needed a dummy 0 extra digit. We take the highest digit of the divider, add 1, and see how many times it goes into the number made of the two highest digits. For instance, in the decimal system, if we have to divide something like 2398062 by 79, we are going to take the 3 highest digits: 239, and we are going to subtract a certain number of times 79 from it. We take the highest divider digit, here 7, add one, so 8, and we will see how many times 8 goes into 23, the number made of the two highest digits from 239. In this case it will be 2 times.

This number of times is a lower bound on the number of times we have to subtract the divider from this n+1 digit number. So we will count upward from this number and see how many times the divider is still smaller than the n+1 digit number.

In our example, we try 2*79 = 158. 158 is smaller than 239, so we try 3*79 = 237 is still smaller than 239, we try 4*79 = 316 is bigger. So we have to subtract 3 times 79.

We now subtract the highest possible number of times the modulus from our n+1 number and write the result back in our intermediate copy. Normally, the highest digit should become 0 now.

We shift everything one digit lower (as the highest digit is 0): we copy again n+1 digits (but shifted one downward), and repeat the procedure).

At each time, we have subtracted an integer times the modulus (times a power of the basis). When we arrive at the n+1 lowest digits of our working word, and we subtract the highest possible number of times the modulus, we will obtain the remainder we're after.

Finally, we can implement the modular multiplication:

/* ----------------------------------------------- */

void timesnum(number* x, number* y, number* p)
/* multiplication in the modular field */
  {
  int i;
  int j;
  bignum pp;
  int ss;
  for (i = 0 ; i < 2*nofdigits ; i++)
    {
    pp.digit[i] = 0;
    }
  /* rough multiplication, out of scope of basis per digit */
  for (i = 0 ; i < nofdigits ; i++)
    {
    for (j = 0 ; j < nofdigits ; j++)
      {
      pp.digit[i+j] += x->digit[i] * y->digit[j];
      }
    }
  /* limiting each digit to the basis, generalized carry */
  for (i = 0 ; i < 2 * nofdigits - 1 ; i++)
    {
    ss = pp.digit[i];
    pp.digit[i+1] += ss / basis;
    pp.digit[i] = ss % basis;
    }
  /* applying modulo */
  bigmodulo(&pp,p);
  } // end multiplynum

/* ----------------------------------------------- */

We use the fact that the computer implementation of the digits can go much further than the basis of the digits. As such, we first apply digit by digit multiplication where all possible combinations are run over, and the result is added to the right digit. Each digit will of course largely overflow. In order to correct this, a generalised carry is then applied at the end, where only the remainder by division by the basis remains in the digit itself, and the quotient is added to the higher-up digit. This is quite rough as a technique, but it has the advantage of having a simple implementation. Once the product is established on 2n digits, we apply modular long division to find the remainder, which is the result of the actual modular multiplication.

Long division

We implement normal long division. This is a somewhat strange result within a modular ring, as we explicitly don't use the modular structure of the ring. It is the standard implementation of long division, exactly as we did for the special case of division by the modulus ; only, now we limit ourselves to numbers of n digits.

/* ----------------------------------------------- */

void dividenum(number* x, number* y, number* q, number* r)
/* Euclidean division */
  {
  int i;
  int k;
  int km;
  int ll;
  int divlen;
  number zero;
  number1 rr;
  number1 high;
  number1 testhigh;
  /* We copy the dividend into an intermediate working number */
  for (i = nofdigits - 1; i >= 0 ; i--)
    {
    rr.digit[i] = x->digit[i];
    q->digit[i] = 0;  // we put to zero the quotient
    zero.digit[i] = 0;
    }
  rr.digit[nofdigits] = 0;
  if (isbigger(y,&zero) == 0)
    {
    printf("Division by zero doesn't work ! \n");
    return;
    }
  /* we determine the number of active digits in the divider */
  for (divlen = nofdigits - 1 ; divlen >= 0 ; divlen--)
    {
    if (y->digit[divlen] != 0)
      {
      break;
      }
    }
  if (printlevel > 0) 
    printf("Divlen is : %d \n",divlen);
  /* divlen will contain the index of the first 
     non-zero divider digit */
  /* we will now apply long division, digit by digit */
  for (i = nofdigits - 1 ; i >= divlen ; i--)
    {
    /* we estimate a lower boundary on the digit of the quotient */
    km = rr.digit[i+1]*basis + rr.digit[i];
    ll = km / (y->digit[divlen] + 1);
    if (printlevel > 0)
      {
      printf("Quotient estimate %d \n",ll);
      printf("Relevant part of dividend : \n");
      }
    /* we copy the relevant part of the dividend */
    /* we have to copy nofdigits + 1 digits      */
    for (k = i + 1 ; k >= i - divlen ; k--)
      {
      high.digit[k - i + divlen] = rr.digit[k];
      if (printlevel > 0) 
        printf("digit %d is %d \n",k,rr.digit[k]);
      }
    for (k = divlen + 2 ; k < nofdigits + 1 ; k++)
      {
      high.digit[k] = 0;
      }
    /* we have to find out how many times to subtract "modulo" 
       before this number is bigger than the relevant part of the 
       dividend */
    do
      {
      digittimesnumber(ll,y,&testhigh);
      ll++;
      }
    while (isbigger1(&high,&testhigh) >= 0);
    ll--;
    ll--;
    if (printlevel > 0)
      printf("Quotient digit number %d is %d \n",i,ll);
    q->digit[i - divlen] = ll;
    /* we will now do the subtraction */
    digittimesnumber(ll,y,&testhigh);
    int borrow;
    int dd;
    borrow = 0;
    for (k = 0 ; k <= divlen + 1 ; k++)
      {
      dd = high.digit[k] - testhigh.digit[k] - borrow;
      if (dd < 0)
        {
        dd = dd + basis;
        borrow = 1;
        }
      else
        {
        borrow = 0;
        }
      high.digit[k] = dd;
      }
    if (high.digit[divlen + 1] !=0 || borrow != 0)
      {
      printf("Internal inconsistency: Hiel vried ambras ! \n");
      }
    /* we update the intermediate result */
    for (k = i+1 ; k >= i - divlen ; k--)
      {
      rr.digit[k] = high.digit[k - i + divlen];
      }
    }
  /* we copy the remainder into the result */
  for (i = nofdigits - 1 ; i >= 0 ; i--)
    {
    r->digit[i] = rr.digit[i];
    }
  }  // end dividenum

/* ----------------------------------------------- */

The principles are the same as commented earlier. The principal difference is that we now keep the digits that were multiplied with the modulus to subtract it, as these digits are the digits of the quotient.

Extended Euclidean Algorithm

The the following function can be very strange within a modular ring at first sight, as normally it is only used in the ring of integers Z. But in fact, this algorithm works just as well in any Euclidean ring (and a modular ring is an Euclidean ring).

/* ----------------------------------------------- */

void exteuclid(number* a, number* b, number* gcd, number* s, number* t)
  {
  number rm1;  // (r-1)  ; gcd is (r-2)
  number s1;   // (s-1)  ; s is (s-2)
  number t1;   // (t-1)  ; t is (t-2)
  number q;    
  number r;
  number d;
  number dodo;
  number sn;
  number tn;
  number zero;
  int i;
  for (i = 0 ; i < nofdigits ; i++)
    {
    gcd->digit[i] = a->digit[i];
    rm1.digit[i] = b->digit[i];
    s1.digit[i] = 0;
    t1.digit[i] = 0;
    s->digit[i] = 0;
    t->digit[i] = 0;
    zero.digit[i] = 0;
    }
  if (isbigger(a,&zero) == 0 || isbigger(b,&zero) == 0)
    {
    printf("Algorithm doesn't work with 0 as input !\n");
    return;
    } 
  s->digit[0] = 1;
  t1.digit[0] = 1;
  do
    {
    /* Extended Euclidean algorithm iteration */
    dividenum(gcd,&rm1,&q,&r);         // r = (r-2) mod (r-1)
    difnum(gcd,&r,&d);                
    dividenum(&d,&rm1,&q,&dodo);       // q = { (r-2) - r } / (r-1)
    timesnum(&q,&s1,&dodo);
    difnum(s,&dodo,&sn);               // s = (s-2) - q * (s-1)
    timesnum(&q,&t1,&dodo);
    difnum(t,&dodo,&tn);               // t = (t-2) - q * (t-1)
    /* preparing next iteration i -> i+1 */
    for (i = 0 ; i < nofdigits ; i++)
      {
      gcd->digit[i] = rm1.digit[i];    // (r-1) -> (r-2)
      rm1.digit[i]  = r.digit[i];       // r     -> (r-1)
      s->digit[i]   = s1.digit[i];       // (s-1) -> (s-2)
      s1.digit[i]   = sn.digit[i];       // s     -> (s-1)
      t->digit[i]   = t1.digit[i];       // (t-1) -> (t-2)
      t1.digit[i]   = tn.digit[i];       // t     -> (t-1)
      }
    }
  while (isbigger(&r,&zero) != 0);
  }  // end exteuclid

/* ----------------------------------------------- */

The extended Euclidean algorithm calculates the GCD of two numbers, by writing it as a linear combination of said numbers. There's nothing explicitly modular in this algorithm, except that we use modular multiplication and modular subtraction. We use "normal" long division. The reason why it works will be explained in the "two's complement and one's complement" section.

Modular multiplicative inverse

Finally, we can compute the multiplicative inverse in the modular ring. The division we implemented was long division, and was not the modular division. The multiplicative inverse exists if the GCD with the modulus is 1, which is always the case if the modulus is a prime number. If the modulus is not a prime number, certain ring elements will not have a multiplicative inverse.

/* ----------------------------------------------- */

void inversenum(number* x,number* invx)
  {
  number oldmodulo;
  number maxmodulo;
  number gcd;
  number s;
  number een;
  number zero;
  number invx1;
  number q;
  int i;
  /* we put the modulo value in a safe place, and put the 
     current modulo parameter to its maximum */
  if (modulo.digit[nofdigits-1] >= basis/2)
    {
    printf("Error: number space not large enough to apply extended Euclid! \n");
    return;
    }
  for (i = 0 ; i < nofdigits ; i++)
    {
    maxmodulo.digit[i] = basis - 1;
    oldmodulo.digit[i] = modulo.digit[i];
    modulo.digit[i] = maxmodulo.digit[i];
    een.digit[i] = 0;
    zero.digit[i] = 0;
    invx->digit[i] = 0;
    }
  if (isbigger(x,&zero) == 0)
    {
    printf("Inverse of zero not possible !\n");
    for (i = 0 ; i < nofdigits ; i++)
      {
      modulo.digit[i] = oldmodulo.digit[i];
      }
    return;
    }
  /* we apply the extended Euclidean algorithm to find 1/x */
  een.digit[0] = 1;
  exteuclid(&oldmodulo,x,&gcd,&s,&invx1);
  if (isbigger(&gcd,&een) != 0)
    {
    printf("GCD is not one ! \n");
    }
  if (isbigger(&invx1,&oldmodulo) >= 0)
    {
    sumnum(&invx1,&oldmodulo,invx);
    }
  else
    {
    sumnum(&invx1,&zero,invx);
    }
  /* we put the original modulo parameter back */
  for (i = 0 ; i < nofdigits ; i++)
    {
    modulo.digit[i] = oldmodulo.digit[i];
    }
  }  // end inversenum

/* ----------------------------------------------- */

The trick is to use the extended Euclidean algorithm. However, we cannot use the extended Euclidean algorithm in the same modular ring as the one we're using to find the multiplicative inverse: the modulus has to be at least twice as large in order for this to work.

This is why we temporarily change the modulus to its maximum value given the number of digits when applying the Euclidean algorithm, to put it back afterwards.

The coefficient of the number to be inverted will be the multiplicative inverse, or the inverse minus the (original) modulus. We check for that, and add one modulus value if needed.

Two's complement and one's complement

We've implemented the arithmetic in a modular ring (that is, modular addition, subtraction and multiplication). We've also implemented modular multiplicative inverse, which only really makes sense in a modular field, that is, when the modulus is a prime number.

However, we've also implemented a few algorithms which normally have no place in a modular ring, but in the ring of integers: long division and the extended Euclidean algorithm. How come that they seem to work ?

In "normal" arithmetic on computers, we have to represent numbers in any case with a finite number of digits in an Hindu-Arabic system. Most computers use base 2 (that is, the binary system). Most implementations of integers use two's complement (or one's complement, although it is less often used) to implement negative numbers, instead of simply using a "sign" bit. Why is that ? The reason is that using these complement representations use the same algorithm for arithmetic operations, whether the operands are negative or positive, while a true sign bit would imply the use of modified algorithms for the 4 possibilities: ++, +-, -+ and -- for the operands.

But what is this "two's complement" and "one's complement" in fact ?

One's complement is: "a negative number has as a bit representation, the flipped bits of the positive number".

So if 0001011 is a positive number (here, representing 7), the one's complement representation of -7 is 1110100. This is in fact exactly the result of doing 0 - 7 in the modular ring with modulus 1111111 !

In other words, the one's complement representation is nothing else but working in the modular ring with modulus 1111111 !

The only thing that one needs, is to limit the "positive" and "negative" numbers so that their scopes don't overlap.

In other words, a modular ring can represent the normal ring of integers if one limits oneself to "positive" numbers less than half the modulus, and "negative" numbers which are in fact "larger" than half of the modulus, but are interpreted as 0 - X.

One's complement has a disadvantage: that is that "minus zero" is 111111 (which is normal: it is the modulus !).

This is why one prefers to use two's complement: for negative numbers, it is "one's complement plus one". As such, the negative of 0, is 0.

But what is that in fact ?

Two's complement is nothing else but working in the modular ring with modulus 10000000, where we have in fact an extra digit.

As such, we see that the usual representations of "integers" in computers is nothing else but using the Hindu-Arabic system in a modular ring, where we only use "part" of the elements as "positive" elements, and the other part as "negative" numbers.

This explains why the extended Euclidean algorithm works also in "large enough" modular rings, and why we can use it to calculate the multiplicative inverse of a "smaller" modular ring.