[Resource Topic] 2024/195: PQC-AMX: Accelerating Saber and FrodoKEM on the Apple M1 and M3 SoCs

Welcome to the resource topic for 2024/195

PQC-AMX: Accelerating Saber and FrodoKEM on the Apple M1 and M3 SoCs

Authors: Décio Luiz Gazzoni Filho, Guilherme Brandão, Gora Adj, Arwa Alblooshi, Isaac A. Canales-Martínez, Jorge Chávez-Saab, Julio López


As CPU performance is unable to keep up with the dramatic growth of the past few decades, CPU architects are looking into domain-specific architectures to accelerate certain tasks. A recent trend is the introduction of matrix-multiplication accelerators to CPUs by manufacturers such as IBM, Intel and ARM, some of which have not launched commercially yet. Apple’s systems-on-chip (SoCs) for its mobile phones, tablets and personal computers include a proprietary, undocumented CPU-coupled matrix multiplication coprocessor called AMX. In this paper, we leverage AMX to accelerate the post-quantum lattice-based cryptosystems Saber and FrodoKEM, and benchmark their performance on Apple M1 and M3 SoCs. We propose a variant of the Toeplitz Matrix-Vector Product algorithm for polynomial multiplication, which sets new speed records for Saber using AMX (up to 13% for the main KEM operations, and 151% for matrix-vector multiplication of polynomials). For FrodoKEM, we set new speed records with our AMX implementation (up to 21% for the main KEM operations, and 124% for matrix multiplication, with even greater improvements for 4 \times-batching). Such speedups are relative to our optimized NEON implementation, also presented here, which improves upon the state-of-the-art implementation for ARMv8 CPUs.

ePrint: https://eprint.iacr.org/2024/195

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .