The Montgomery reduction algorithm finds the remainder of a division. Many cryptographic schemes work with numbers modulo a prime. When you have to multiply two numbers with e.g. 128 bits each, first you multiply them the usual way (there are many techniques for this) to obtain a 256-bit (“double precision”) number. Then you need to reduce this result modulo the prime you’re working with, that is, compute the remainder of the division of this number over the prime.
You can compute the remainder of a division with the “schoolbook” technique everyone learns in school, but that is expensive and requires divisions, with are costly in many platforms (some microcontrollers don’t even have a division instruction). Montgomery reduction only needs a division by a powers of the integer size, which are cheap for computers.
Here I’ll try to explain how it works, in an informal approach. For detailed proofs of its correctness, check e.g. the chapter 14 of the Handbook of Applied Cryptography or the original paper.
First approach
When working modulo a prime (or modulo anything), you can add or subtract the prime (call it p) from the number you’re working with (call it x) and you’ll always get the same number. For example, 3 = 3 + 7 = 10 = 3 (mod 7), or 3 = 3 – 7 = -4 = 3 (mod 7). This gives the basic approach for finding remainders: just subtract the p from x until you reach a number smaller than p.
For example, you want to compute 6 * 5 modulo 7. You have 6 * 5 = 30, now you just need the remainder of the division of 30 over 7, that is, 30 modulo 7. Now it’s just subtractions:
30 - 7 = 23 (mod 7) 21 - 7 = 16 (mod 7) 16 - 7 = 9 (mod 7) 9 - 7 = 2 (mod 7)
And you’re done: 6 * 5 = 2 (mod 7). But if the number you’re working with is big, the number of subtractions you’ll need will be too large. You can speed up this process by dividing the number by the modulus and computing r = x – (x/p * p). But as I said, division is expensive.
Let’s add, not subtract
The first insight of the Montgomery reduction is this: what if instead of subtracting, you add the prime modulus many times? And more, what if you add the modulus in a way that the number being reduced is filled with zeros to the right? For example, let’s work modulo 97. You want to compute 43 * 56 (mod 97). You multiply the operands obtaining 2408, and you’ll need to reduce it modulo 97. You can add the 97 to this number any times you want, that is, you can add any multiple of 97. Which multiple of 97 that, when added to 2408, leads to a number with last digit 0?
2408 + 6 * 97 = 2408 + 582 = 2990
It’s 6 times 97. It’s easy to find this “magic number”, but I won’t go into this right now. OK, you have a rightmost zero. Let’s try to add again in order to have two zeros:
2990 + 30 * 97 = 2408 + 2910 = 5900
Great! But… so what? Well, suppose these numbers live in a world where they must be divided by 100 after you multiply them and use the above procedure. Then you divide 5900 by 100, which is cheap, and you have 59.
Sadly, these numbers don’t live in such a world, and 59 is not the right answer. If the division is exact, you can divide by any multiple of 97, but you can’t simply divide by 100.
The Montgomery World
The second insight of the Montgomery reduction is: if you need such a world were you must divide by 100, then just create it!
The Montgomery reduction requires transforming the numbers into the “Montgomery domain”, which is exactly the world we need. First let’s look at our modulus: 97. It has two digits in base 10. Then let R be smaller power of the base that is greater than the modulus. For 97, we have R = 10^2 = 100.
To transform a number x into the Montgomery domain, compute x * R (mod p). In our example, suppose x = 35. Then 35 * 100 (mod 97) = 8 (mod 97).
(You may notice that this transformation simply begs the question – you need to reduce modulo p in order to convert the numbers to a form where it’s easier to reduce modulo p! I’ll talk about this later on.)
Now, the beauty of the Montgomery domain: suppose that you want to compute x * y (mod p). Converting the operands, you’ll actually compute (x * R) * (y * R) (mod p). But you want the result to still live in the Montgomery domain. That’s because you may need to do many other multiplications. Since converting the number back to the real world is somewhat expensive, it’s best to keep the computations in the Montgomery domain. Therefore, you want to compute x * y * R (mod p), i.e., you want x * y, but in the Montgomery domain.
Let’s see… if you just multiply the operands the usual way, you get (x * R) * (y * R) (mod p) = x * y * R * R (mod p). But you want x * y * R (mod p). Therefore, every time you multiply two numbers, you’ll need to reduce the result modulo p, but you’ll also need to divide by R. Recall our example, where R = 100, and we wanted to be required to divide by 100. That’s exactly what we got: the Montgomery domain requires the division by 100!
An example
Let x = 43, y = 56, p = 97, R = 100. You want to compute x * y (mod p). First you convert x and y to the Montgomery domain. For x, compute x’ = x * R (mod p) = 43 * 100 (mod 97) = 32, and for y, compute y’ = y * R (mod p) = 56 * 100 (mod 97) = 71.
Compute a := x’ * y’ = 32 * 71 = 2272.
In order to zero the first digit, compute a := a + (4p) = 2272 + 388 = 2660.
In order to zero the second digit, compute a := a + (20p) = 2660 + 194 = 4600.
Compute a := a / R = 4600 / 100 = 46.
We have that 46 is the Montgomery representation of x * y (mod p), that is, x * y * R (mod p). In order to convert it back, compute a * (1/R) (mod p) = 46 * 65 (mod 97) = 80. You can check that 43 * 56 (mod 97) is indeed 80.
Converting the numbers
Let Montgomery(x’,y’) = (x’y')/R (mod p) be the Montgomery reduction of x’ and y’.
Like I mentioned, in order to compute the Montgomery representation of a number or in order to convert it back, you need to multiply by R modulo p or divide by R modulo p. You can compute this remainders using a classic, expensive algorithm, since for practical applications you’ll rarely need to convert the numbers (for example, in a cryptographic application: since no one will need to actually see the real form of the numbers, there is no need for conversions and it’s possible to work entirely in the Montgomery domain). But there is also another approach. You can precompute R’ = R^2 (mod p) and then convert a number x to xR (mod p) by simple computing the Montgomery reduction of x and R’, since Montgomery(x, R’) = Montgomery(x, R^2) = (xRR)/R (mod p) = xR (mod p). To convert back a number x’ = xR (mod p), simply compute the Montgomery reduction of x’ and 1, since Montgomery(x’, 1) = Montgomery(xR, 1) = (xR)/R (mod p) = x (mod p).
The magic numbers
Let B be the base you’re working with. The main step of the Montgomery reduction is “add a multiple k of p that, when added to a, makes its i-th digit zero”. It’s not hard to find k. First, notice that the only relevant digits are the first digit of p (call it p[0]) and the i-th digit of a (call it a[i]). Then:
a[i] + B^i * k * p[0] = 0 (mod B) k = B^i * a[i] * -(1/p[0]) (mod B)
You can precompute -(1/p[0]) (mod B) and then multiply it by a[i] and B^i (which is just an offset) in order to find k. In our example, with p = 97 and base 10, this value is -(1/7) (mod 10) = -3 (mod 10) = 7.
In the real world
For example, in elliptic curve cryptography in the 128-bit level of security, the operands have 256 bits. Suppose the target platform is a desktop PC with 32 bits. Then the operands are represented by eight 32-bit digits (i.e. B=2^32 as opposed to B=10 in the examples); R is 2^256; and the Montgomery reduction requires eight steps (since the numbers have eight integers).
If you look carefully, you can notice that the Montgomery reduction is nothing more than a multiplication of p with an operand which is computed on the fly (the “magic numbers”). This allows the use of many optimizations from multiplication algorithms.

where p is a prime. You can represent this group with polynomials of degree 1 which can be added the usual way and multiplied taking the result modulo a irreducible polynomial of degree 2. To understand why this makes sense, I recommend
. Working modulo this polynomial is the same thing as working in a “world” where
. (You could of course work with a polynomial in the form
but that would complicate things.)
. What happens when you apply the Frobenius endomorphism? Let’s see:
has many terms, but only the first
and the last
survive because all others are multiples of p. Since we’re working with coefficients modulo p, they are all zero.
, then raising to the power of p won’t change them (yep, that’s Frobenius again). So we have:
left to bother us. If p is odd (not 2), then you can rearrange this as:

. But according to
is not a square (if it were,
. Then the formula gets much simpler:
.
, just to allow us to specify the modulus in the next step:
(so
). Let’s take an arbitrary element to the power of 7:
, you can represent it as a quadratic extension of
, which can be represented as a cubic extension of
.
built this way, with
, you just apply the same trick explained above. The only difference is that a and b to the power of p aren’t a and b themselves, but that’s not a problem, you just apply the same trick recursively.
with m > 1, you can represent the elements as polynomials with degree m-1 and do the field addition and multiplication modulo a irreducible polynomial with degree m.
, you have (2x + 5) * (3x + 4) = (4x + 3).
:
over GF(503):
elements) than with multiplicative groups of integers modulo n (no smaller than
elements). This is not set in stone though; maybe someday someone will come up with a better method to crack this seemingly random structure (for now the best method to solve the discrete log problem for elliptic groups is exponential, while the best method for integers modulo n is sub-exponential).
over GF(5) (left) which is isomorphic to the additive group modulo 5 (right):