Implementation of public key algorithms in CUDA

2010

In the field of cryptography, public key algorithms are widely known to be slower than

symmetric key alternatives for the reason of their basis in modular arithmetic. The modular

arithmetic in e.g. RSA and Diffie Hellman is computationally heavy when compared

to symmetric algorithms relying on simple operations like shifting of bits and XOR. Therefore,

how to make a more efficient and faster implementation of public key algorithms

is publicly concerned.

With the development of the GPGPU (General-purpose computing on graphics processing

units) field, more and more computing problems are solved by using the parallel

property of GPU (Graphics Processing Unit). CUDA (Compute Unified Device Architecture)

is a framework which makes the GPGPU more accessible and easier to learn for

the general population of programmers. This is because it builds on C and hides many of

the complicated details of how the GPU works from a CUDA developer. Using the unique

properties of the GPU through CUDA has greatly increased the efficiency of many computational

problems. Multiplication of big integers is one of the building blocks in doing

modular arithmetic. Running the public key algorithms by use of the parallel properties

of the GPU in modular multiplication and modular exponentiation may be a solution to

this problem.

The target in this research is to study and analyse the majority of algorithms related

to the modular multiplication and modular exponentiation, and then to design and make

an implementation of a public key algorithm in CUDA. Finally, this project will compare

the performance between the GPU implementation and the CPU implementation in order

to look into the possibility of improving the performance of public key algorithms. The

research questions are divided into four groups, the first one regarding modular multiplication

and modular exponentiation of big integers and their parallelism, the second

one about integrating parallel modular multiplication and modular exponentiation into

the public key algorithm, the third one concerning optimization of the algorithm, and

final one regarding performance comparison of public key algorithm between the GPU

implementation and the CPU implementation.