[Resource Topic] 2021/1389: DPCrypto: Acceleration of Post-quantum Cryptographic Algorithms using Dot-Product Instruction on GPUs

Welcome to the resource topic for 2021/1389

DPCrypto: Acceleration of Post-quantum Cryptographic Algorithms using Dot-Product Instruction on GPUs

Authors: Wai-Kong Lee, Hwajeong Seo, Seong Oun Hwang, Angshuman Karmakar, Jose Maria Bermudo Mera, and Ramachandra Achar


Dot-product is a widely used operation in many machine learning and scientific computing algorithms. Recently, NVIDIA has introduced dot-product instructions (DP2A and DP4A) in modern GPU architectures, with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper, we show that the dot-product instruction can also be used to accelerate matrix-multiplication and polynomial convolution operations, which are commonly found in post-quantum lattice-based cryptographic schemes. In particular, we propose a highly optimized implementation of FrodoKEM, wherein the matrix-multiplication is accelerated by the dot-product instruction. We also present specially designed data structures that allow an efficient implementation of Saber key encapsulation mechanism, utilizing the dot-product instruction to speed-up the polynomial convolution. The proposed FrodoKEM implementation achieves 4.37x higher throughput in terms of key exchange operations per second than the state-of-the-art implementation on V100 GPU. This paper also presents the first implementation of Saber on GPU platforms, achieving 124,418, 120,463, and 31,658 key exchange operations per second on RTX3080, V100, and T4 GPUs, respectively. Since matrix-multiplication and polynomial convolution operations are the most time-consuming operations in lattice-based cryptographic schemes, our proposed techniques are likely to benefit other similar algorithms. The proposed high throughput implementation of KEMs on various GPU platforms allows the heavy computations (KEMs) to be offloaded from the server. This is very useful for many emerging applications like Internet of Things and cloud computing.

ePrint: https://eprint.iacr.org/2021/1389

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .