Welcome to the resource topic for 2024/195
PQC-AMX: Accelerating Saber and FrodoKEM on the Apple M1 and M3 SoCs
Authors: Décio Luiz Gazzoni Filho, Guilherme Brandão, Gora Adj, Arwa Alblooshi, Isaac A. Canales-Martínez, Jorge Chávez-Saab, Julio LópezAbstract:
As CPU performance is unable to keep up with the dramatic growth of the past few decades, CPU architects are looking into domain-specific architectures to accelerate certain tasks. A recent trend is the introduction of matrix-multiplication accelerators to CPUs by manufacturers such as IBM, Intel and ARM, some of which have not launched commercially yet. Apple’s systems-on-chip (SoCs) for its mobile phones, tablets and personal computers include a proprietary, undocumented CPU-coupled matrix multiplication coprocessor called AMX. In this paper, we leverage AMX to accelerate the post-quantum lattice-based cryptosystems Saber and FrodoKEM, and benchmark their performance on Apple M1 and M3 SoCs. We propose a variant of the Toeplitz Matrix-Vector Product algorithm for polynomial multiplication, which sets new speed records for Saber using AMX (up to 13% for the main KEM operations, and 151% for matrix-vector multiplication of polynomials). For FrodoKEM, we set new speed records with our AMX implementation (up to 21% for the main KEM operations, and 124% for matrix multiplication, with even greater improvements for 4 \times-batching). Such speedups are relative to our optimized NEON implementation, also presented here, which improves upon the state-of-the-art implementation for ARMv8 CPUs.
Feel free to post resources that are related to this paper below.
Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.
For more information, see the rules for Resource Topics .