[Resource Topic] 2024/1729: cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications

Welcome to the resource topic for 2024/1729

Title:
cuTraNTT: A Novel Transposed Number Theoretic Transform Targeting Low Latency Homomorphic Encryption for IoT Applications

Authors: Supriya Adhikary, Wai Kong Lee, Angshuman Karmakar, Yongwoo Lee, Seong Oun Hwang, Ramachandra Achar

Abstract:

Large polynomial multiplication is one of the computational bottlenecks in fully homomorphic encryption implementations. Usually, these multiplications are implemented using the number-theoretic transformation to speed up the computation. State-of-the-art GPU-based implementation of fully homomorphic encryption computes the number theoretic transformation in two different kernels, due to the necessary synchronization between GPU blocks to ensure correctness in computation. This can be a serious limitation in embedded systems that only have constrained computational resources to support the time-consuming homomorphic encryption. In this paper, we proposed a series of techniques to improve the performance of number theoretic transform targeting homomorphic encryption on a GPU device. Firstly, we proposed to arrange the polynomials in a transposed manner and skip the last two levels of radix-4 number theoretic transformation, allowing us to completely
avoid the block synchronization in NTT implementation. This technique improved the performance of homomorphic encryption by 1.37× and 1.34× on RTX 4060 and Jetson Orin Nano respectively, compared to the conventional approach that uses full NTT without skipping any levels. However, such an approach also introduces extra overhead in the subsequent point-wise multiplication, which slows down the homomorphic multiplication. To reduce this negative impact, a fast 16 × 16 point-wise
multiplication implementation was proposed, which relies on the heavily optimized Toom-Cook 4-way algorithm. Experimental results show that our proposed homomorphic multiplication can achieve similar latency compared to Jung et al. and Yang et al., which are the best results to date. This shows that the proposed cuTraNTT is able to reduce the latency of homomorphic encryption without sacrificing the performance in homomorphic multiplication.

ePrint: https://eprint.iacr.org/2024/1729

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .