Welcome to the resource topic for 2022/881
Title:
A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator
Authors: Lipeng Wan, Fangyu Zheng, Guang Fan, Rong Wei, Lili Gao, Jiankuo Dong, Jingqiang Lin, and Yuewu Wang
Abstract:Public-key cryptography (PKC), including conventional cryptosystems (e.g., RSA, ECC) and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations, while ensuring the correctness of the results and bringing convincing performance gains. After investigating and analysing the workload of the commercial off-the-shelf AI accelerators, we utilize NVIDIA’s AI accelerator, Tensor Core, to accelerate the polynomial multiplication, usually the most time-consuming part in lattice-based cryptography. A series of measures are taken, such as accommodating the matrix-multiply-and-add mode of Tensor Core and making a trade-off between precision and performance, to leverage Tensor Core as a high-performance NTT box performing NTT/INTT through CUDA C++ WMMA API. Meanwhile, we take CRYSTALS-Kyber, one of the NIST PQC 3rd round candidates, as a case study on RTX 3080 with the Ampere Tensor Core. The empirical results show that the defined NTT of polynomial vector (n=256,k=4) with our NTT box obtains a speedup of around 6.47x that of the state-of-the-art implementation on the same platform.
ePrint: https://eprint.iacr.org/2022/881
See all topics related to this paper.
Feel free to post resources that are related to this paper below.
Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.
For more information, see the rules for Resource Topics .