[Resource Topic] 2025/1867: Vectorized Falcon-Sign Implementations using SSE2, AVX2, AVX-512F, NEON, and RVV

Welcome to the resource topic for 2025/1867

Title:
Vectorized Falcon-Sign Implementations using SSE2, AVX2, AVX-512F, NEON, and RVV

Authors: Jipeng Zhang, Jiaheng Zhang

Abstract:

Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature generation is discrete Gaussian sampling, which involves a component known as the BaseSampler. The BaseSampler accounts for up to 30% of the time required for signature generation, making it a significant performance bottleneck. This work aims to address this bottleneck.

We design a vectorized version of the BaseSample and provide optimized implementations across six different instruction sets: SSE2, AVX2, AVX-512F, NEON, RISC-V Vector (RVV), and RV64IM. The AVX2 implementation, for instance, achieves an 8.4× speedup over prior work. Additionally, we optimize the FFT/iFFT operations using RVV and RV64D. For the RVV implementation, we introduce a new method using strided load/store instructions, with 4+4 and 4+5 layer merging strategies for Falcon-512 and Falcon-1024, respectively, resulting in a speedup of more than 4×.

Finally, we present the results of our optimized implementations across eight different instruction sets for signature generation of Falcon. For instance, our AVX2, AVX-512F, and RV64GCVB implementations achieve performance improvements of 23%, 36%, and 59%, respectively, for signature generation of Falcon-512.

ePrint: https://eprint.iacr.org/2025/1867

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .