[Resource Topic] 2023/1580: Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime

Welcome to the resource topic for 2023/1580

Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime

Authors: Vincent Hwang, Chi-Ting Liu, Bo-Yin Yang


In this paper, we explore the cost of vectorization for polynomial multiplication with coefficients in Zq for an odd prime q. If there is a large power of two dividing q−1, we can apply radix-2 Cooley–Tukey fast Fourier transforms to multiply polynomials in Zq[x]. The radix-2 nature admits efficient vectorization. Conversely, if 2 is the only power of two dividing q−1, we can apply Schönhage’s and Nussbaumer’s FFTs to craft radix-2 roots of unity, but these double the number of coefficients. We show how to avoid this doubling while maintaining vectorization friendliness with Good–Thomas, Rader’s, and Bruun’s FFTs. In particular, we exploit the existing Fermat-prime factor of q − 1 for Rader’s FFT and the power-of-two factor of q + 1 for Bruun’s FFT. We implement these ideas for the NTRU Prime instances ntrulpr761/sntrup761, operating over the coefficient ring Z4591 on a Cortex-A72. sntrup761 is currently used in OpenSSH 9.0 by default.
Our polynomial multiplication outperforms the state-of-the-art vector-optimized implementation by 6.1×. For ntrulpr761, our keygen, encap, and decap are 2.98×, 2.79×, and 3.07× faster than the state-of-the-art vector-optimized implementation. For sntrup761, we outperform the reference implementation significantly.

ePrint: https://eprint.iacr.org/2023/1580

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .