Welcome to the resource topic for 2025/1991
Title:
TWFalcon: Triple-Word Arithmetic for Falcon; Giving Falcon the Precision to Fly Securely
Authors: Stef Halmans, Christine van Vredendaal, Tobias Schneider, Frank Custers, Tim Güneysu
Abstract:The post-quantum signature scheme Falcon is an attractive scheme for constrained devices due to its compactness and verification performance. However, it relies on floating-point arithmetic for signature generation, which - alongside physical security concerns - introduces two additional drawbacks:
Firstly, if implemented using the standard double-precision format, Falcon does not satisfy the
formally proven error bounds required for a secure Gaussian sampler implementation.
Although no practical attacks exploiting this limitation are currently known, it does give future attack concerns.
Secondly, when looking at constrained devices, 32-bit constrained devices can lack hardware support for high-precision floating-point arithmetic and its use introduces significant performance overhead, as it must be emulated using integers.
In this work we present a novel method to address these limitations: We show that Falcon can be implemented using \textit{single-precision} floating-point numbers. Our proposed method uses Triple-Word Floating-Point (TW) arithmetic and achieves a precision of at least 72 bits, compared to the 53 bits of double-precision floating-point arithmetic.
We show our implementation achieves error bounds that meet the formal security requirements for a secure Gaussian sampler implementation, while maintaining other security guarantees. This way, Falcon can run on constrained devices equipped only with a single-precision Floating-Point Unit (FPU) without the need for integer emulation.
We demonstrate the feasibility of our approach on the Nucleo-L4R5ZI board, which features a Cortex-M4F processor enabled with a single-precision FPU. More precisely, we show the cost of increasing the precision of Falcon in this way only increases the computational effort by a factor of approximately 1.84 compared to the CPU cycles required for an implementation using emulated double-precision arithmetic via integers.
ePrint: https://eprint.iacr.org/2025/1991
See all topics related to this paper.
Feel free to post resources that are related to this paper below.
Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.
For more information, see the rules for Resource Topics .