[Resource Topic] 2022/1657: CycloneNTT: An NTT/FFT Architecture Using Quasi-Streaming of Large Datasets on DDR- and HBM-based FPGA Platforms

Welcome to the resource topic for 2022/1657

Title:
CycloneNTT: An NTT/FFT Architecture Using Quasi-Streaming of Large Datasets on DDR- and HBM-based FPGA Platforms

Authors: Kaveh Aasaraai, Emanuele Cesena, Rahul Maganti, Nicolas Stalder, Javier Varela, Kevin Bowers

Abstract:

Number-Theoretic-Transform (NTT) is a variation of Fast-Fourier-Transform (FFT) on finite fields. NTT is being increasingly used in blockchain and zero-knowledge proof applications. Although FFT and NTT are widely studied for FPGA implementation, we believe CycloneNTT is the first to solve this problem for large data sets (\ge2^{24}, 64-bit numbers) that would not fit in the on-chip RAM. CycloneNTT uses a state-of-the-art butterfly network and maps the dataflow to hybrid FIFOs composed of on-chip SRAM and external memory. This manifests into a quasi-streaming data access pattern minimizing external memory access latency and maximizing throughput. We implement two variants of CycloneNTT optimized for DDR and HBM external memories. Although historically this problem has been shown to be memory-bound, CycloneNTT’s quasi-streaming access pattern is optimized to the point that when using HBM (Xilinx C1100), the architecture becomes compute-bound. On the DDR-based platform (AWS F1), the latency of the application is equal to the streaming of the entire dataset \log N times to/from external memory. Moreover, exploiting HBM’s larger number of channels, and following a series of additional optimizations, CycloneNTT only requires \frac{1}{6}\log N passes.

ePrint: https://eprint.iacr.org/2022/1657

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .