[Resource Topic] 2021/1015: Look-up the Rainbow: Efficient Table-based Parallel Implementation of Rainbow Signature on 64-bit ARMv8 Processors

Welcome to the resource topic for 2021/1015

Title:
Look-up the Rainbow: Efficient Table-based Parallel Implementation of Rainbow Signature on 64-bit ARMv8 Processors

Authors: Hyeokdong Kwon, Hyunjun Kim, Minjoo Sim, Wai-Kong Lee, Hwajeong Seo

Abstract:

Rainbow signature is one of the finalist in National Institute of Standards and Technology (NIST) standardization. It is also the only signature candidate that is designed based on multivariate quadratic hard problem. Rainbow signature is known to have very small signature size compared to other post-quantum candidates. In this paper, we propose an efficient implementation technique to improve performance of Rainbow signature schemes. A parallel polynomial-multiplication on a 64-bit ARMv8 processor was proposed, wherein a look-up table was created by pre-calculating the 4\times4 multiplication results. This technique was developed based on the observation that the existing implementation of Rainbow’s polynomial-multiplication relies on the Karatsuba algorithm. It is not optimal due to the divide and conquer steps involved, whereby operations on \mathbb{F}_{16} are divided into many small sub-fields of \mathbb{F}_{4} and \mathbb{F}_{2}. Further investigations reveal that when the polynomial-multiplication in Rainbow signature is operated on \mathbb{F}_{16}, its operand is in 4-bit. Since the maximum combinations of a 4 \times 4 multiplication is only 256, we constructed a 256-byte look-up table. According to the 4-bit constant, only 16-byte is loaded from the table at one time. The time-consuming multiplication is replaced by performing the table look-up. In addition, it calculates up-to 16 result values per register using characteristics of vector registers available on 64-bit ARMv8 processor. With the proposed fast polynomial-multiplication technique, we implemented the optimized Rainbow III and V. These two parameter sets are performed on \mathbb{F}_{256}, but they use sub-field \mathbb{F}_{16} in the multiplication process. Therefore, the sub-field multiplication can be replaced with the proposed table look-up technique, which in turn omitted a significant number of operations. We have carried out the experiments on the Apple M1 processor, which shows up to 167.2$\times$ and 51.6$\times$ better performance enhancement at multiplier, and Rainbow signatures, respectively, compared to the previous implementation.

ePrint: https://eprint.iacr.org/2021/1015

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .