[Resource Topic] 2025/1957: Fast Batch Matrix Multiplication in Ciphertexts

system · October 20, 2025, 5:12pm

Welcome to the resource topic for 2025/1957

Title:
Fast Batch Matrix Multiplication in Ciphertexts

Authors: Jung Hee Cheon, Minsik Kang, Junho Lee

Abstract:

Encrypted matrix multiplication (MM) is a fundamental primitive in privacy-preserving machine learning and encrypted data search, but it remains a significant performance bottleneck. Recently, Bae et al.~(Crypto’24) and Park~(Eurocrypt’25) introduced novel algorithms for ciphertext–plaintext (CPMM) and ciphertext–ciphertext (CCMM) matrix multiplications. These algorithms reduce encrypted MM operations to plaintext matrix multiplications (PPMM), enabling implementation through highly optimized BLAS libraries.
While these reduction-based methods offer significant improvements, their applicability is limited to scenarios where the matrix dimension
d is comparable to the ring dimension N in RLWE-based CKKS schemes.
As a result, they fall short for matrix multiplications involving small or medium-sized matrices.

We extend the reduction-based CPMM/CCMM into small-sized matrix operations by batching instances. We use the Slots-in-Coefficient (SinC) encoding where a ring element is represented by a polynomial with coefficients each of which is the Discrete Fourier Transform of matrix entries at the same position. This encoding enables reductions of encrypted batch MM algorithms to a small number of batch PPMMs, which can be efficiently accelerated by BLAS libraries.
Our batch encrypted MM flexibly accommodates diverse matrix dimensions and batch sizes independent of the ring dimension N, thereby extending its applicability to practical real-world settings.

For two d \times d matrices with N/d batches, our batch CPMM and CCMM algorithms achieve complexity O(d^2N), improving upon Bae et al. at O(dN^2) and Jiang et al~(CCS’18) at O(d^2 N\log (N)).
We further extend our techniques to rectangular matrices, achieving O(dN^2) for multiplying a d \times N and an N \times N matrix, improving previous O(N^3) methods.
A proof-of-concept implementation validates these improvements: multiplying 128 batches of 64 \times 64 matrices takes $0.20$s (CPMM) and $0.91$s (CCMM), yielding 205\times and 64\times speedups over previous methods. For a 64 \times 2048 by 2048 \times 2048 multiplication, our CCMM completes in $7.8$s, achieving a 28\times speedup compared to Park’s algorithm.

ePrint: https://eprint.iacr.org/2025/1957

See all topics related to this paper.

Feel free to post resources that are related to this paper below.

Example resources include: implementations, explanation materials, talks, slides, links to previous discussions on other websites.

For more information, see the rules for Resource Topics .