CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors

Authors: Oren Ganon, Itamar Levi


The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge devices. Introducing CrISA-X, a Cryptography Instruction Set Architecture extensions designed to improve cryptographic latency on extendable processors. CrISA-X, classified as Generic-Atomic, Block-Specific and Procedure-Specific, leverages RISC processor hardware and a base ISA to effectively execute LWC algorithms. Our study aims to evaluate the execution efficiency of new single-cycle instruction extensions and tightly coupled multicycle instructions on extendable modular RISC processors. CrISA-X provides enhanced speed of various algorithms simultaneously while optimizing ISA adaptability, a feat yet to be accomplished. The extension, diverse for several computation levels, is first specifically tailored for individual algorithms and sets of LWC algorithms, depending on performance, frequency, and area trade-offs. By diligently applying the Min-Max optimization technique, we have configured these extensions to achieve a delicate balance between performance, area code size, etc. Our study presents empirical evidence of the performance enhancement achieved on a real synthesis modular RISC processor. We offer a framework for creating optimized processor hardware and ISA extensions. The CrISA-X framework generally outperforms ISA extensions by delivering significant performance boosts between 3x to 17x while experiencing a relative area cost increase of +12% and +47% in LUTs, in respect to the instruction set category. Notably, as one important example, the utilization of the ASCON algorithm yields a 10x performance boost in contrast to the base ISA instruction implementation

