fips203ipd
C11 implementation of FIPS 203 initial public draft (IPD).
|
Embeddable, dependency-free C11 implementation of ML-KEM from the FIPS 203 initial public draft (IPD) with scalar, AVX-512, and Neon backends.
FIPS 203 is (or will be) NIST's standardized version of Kyber, a post-quantum key encapsulation mechanism (KEM).
Notes on this implementation:
fips203ipd.c
(see bottom of file) and compiles with common sanitizers enabled.keygen()
and encaps()
is specified as a function parameter.Use make
to build a minimal self test application, make doc
to build the HTML-formatted API documentation, and make test
to run the test suite.
Minimal example of Alice and Bob exchanging a shared secret with KEM512:
See examples/0-hello-kem/
for the full buildable example, including a Makefile
and support files.
API documentation is available online here and also in fips203ipd.h
. If you have Doxygen installed, you can build HTML-formatted API documentation by typing make doc
.
Use make test
to build and run the test suite.
The test suite checks each component of this implementation for expected answers and is built with common sanitizers supported by both GCC and Clang. The source code for the test suite is embedded at the bottom of fips203ipd.c
behind a TEST_FIPS203IPD
define.
You can also build a quick test application by typing make
in the top-level directory. The test application does the following 1000 times for each parameter set:
There are safer and faster alternatives, but if you want to use this library anyway:
fips203ipd.h
and fips203ipd.c
into your source tree.fips203ipd.o
.fips203ipd.h
in your application.fips203ipd_*()
functions in your code.A minimal libcpucycles-based benchmarking tool is available in examples/3-bench/
. The bench
tool repeatedly measures the number of CPU cycles used for key generation, encapsulation, and decapsulation for each parameter set and then prints summary statistics to standard output in CSV format.
bench
results from a few of my systems are shown below.
set | function | median | mean | stddev | min | max |
---|---|---|---|---|---|---|
kem512 | keygen | 17633 | 17723 | 676 | 17302 | 54660 |
kem512 | encaps | 21602 | 21705 | 938 | 21304 | 75753 |
kem512 | decaps | 25733 | 25894 | 4934 | 25376 | 372564 |
kem768 | keygen | 29384 | 29528 | 1557 | 28885 | 133739 |
kem768 | encaps | 32511 | 32632 | 603 | 31974 | 59495 |
kem768 | decaps | 38176 | 38223 | 529 | 37368 | 46061 |
kem1024 | keygen | 39829 | 40111 | 4865 | 39166 | 365645 |
kem1024 | encaps | 45250 | 45437 | 781 | 44510 | 76805 |
kem1024 | decaps | 52425 | 52605 | 636 | 51282 | 63121 |
set | function | median | mean | stddev | min | max |
---|---|---|---|---|---|---|
kem512 | keygen | 53711 | 53824 | 945 | 53355 | 106488 |
kem512 | encaps | 61366 | 61487 | 600 | 61010 | 70577 |
kem512 | decaps | 73559 | 73666 | 586 | 73202 | 85529 |
kem768 | keygen | 92560 | 92754 | 896 | 92026 | 119126 |
kem768 | encaps | 104842 | 105065 | 802 | 104352 | 114855 |
kem768 | decaps | 121485 | 121718 | 826 | 120907 | 130430 |
kem1024 | keygen | 140219 | 143515 | 10211 | 139507 | 214890 |
kem1024 | encaps | 154949 | 158546 | 11176 | 154237 | 224903 |
kem1024 | decaps | 176131 | 180226 | 12692 | 175330 | 231178 |
set | function | median | mean | stddev | min | max |
---|---|---|---|---|---|---|
kem512 | keygen | 96450 | 96734 | 1455 | 94875 | 121425 |
kem512 | encaps | 107550 | 107831 | 1552 | 106050 | 132825 |
kem512 | decaps | 126375 | 126681 | 1704 | 125250 | 165450 |
kem768 | keygen | 168450 | 168962 | 2058 | 165900 | 209175 |
kem768 | encaps | 186975 | 187449 | 2169 | 184425 | 214350 |
kem768 | decaps | 212925 | 213580 | 2769 | 209550 | 259725 |
kem1024 | keygen | 260325 | 261156 | 2861 | 256275 | 304950 |
kem1024 | encaps | 281175 | 282064 | 2891 | 277575 | 326850 |
kem1024 | decaps | 314250 | 315165 | 3441 | 310875 | 410850 |
This library includes several accelerated backends which are selectable at compile time via the optional BACKEND
parameter. By default the fastest available backend is selected at compile-time.
The available backends are:
BACKEND=0
): Automatically detect the appropriate backend at compile-time (default).BACKEND=1
): Fallback if no faster backend is available.BACKEND=2
): AVX-512 acceleration. Selected by default if AVX-512 is supported.BACKEND=3
): ARM Neon acceleration. Selected by default if Neon is supported..When the a backend is enabled, the following functions are replaced with accelerated equivalents:
Functions | Description |
---|---|
permute() | block permutation for SHA3-256 and SHA3-512 |
poly_{ntt,inv_ntt}() | Number-Theoretic Transform (NTT) and inverse NTT |
poly_{add,add2,sub,mul}() | Polynomial arithmetic |
poly_encode() , poly_encode_{11,10,5,4,1}bit() | Polynomial encoding |
poly_decode() , poly_decode_{11,10,5,4,1}bit() | Polynomial decoding |
pke{512,768,1024}_keygen_sample() | Key generation (see below) |
pke{512,768,1024}_encrypt_sample() | Encryption (see below) |
You can use the fips203ipd_backend()
function to get the backend name at run-time.
The AVX-512-enabled key generation and encryption functions perform coefficient sampling in stages and use the 64-bit lanes of 25 512-bit registers to permute and sample from up to 8 SHAKE128 and SHAKE256 contexts at once.
Here are how the lanes of the AVX-512 registers are used for each parameter set, function, and stage:
Parameter Set | Function | Stage | Lane 0 | Lane 1 | Lane 2 | Lane 3 | Lane 4 | Lane 5 | Lane 6 | Lane 7 |
---|---|---|---|---|---|---|---|---|---|---|
KEM512 | keygen | 1 | A[0,0] | A[0,1] | A[1,0] | A[1,1] | s[0] | s[1] | e[0] | e[1] |
KEM512 | encrypt | 1 | A[0,0] | A[1,0] | A[0,1] | A[1,1] | r[0] | r[1] | n/a | n/a |
KEM512 | encrypt | 2 | e1[0] | e1[1] | e2 | n/a | n/a | n/a | n/a | n/a |
KEM768 | keygen | 1 | A[0,0] | A[0,1] | A[0,2] | A[1,0] | A[1,1] | A[1,2] | A[2,0] | A[2,1] |
KEM768 | keygen | 2 | A[2,2] | s[0] | s[1] | s[2] | e[0] | e[1] | e[2] | n/a |
KEM768 | encrypt | 1 | A[0,0] | A[1,0] | A[2,0] | A[0,1] | A[1,1] | A[2,1] | A[0,2] | A[1,2] |
KEM768 | encrypt | 2 | A[2,2] | r[0] | r[1] | r[2] | e1[0] | e1[1] | e1[2] | e2 |
KEM1024 | keygen | 1 | A[0,0] | A[0,1] | A[0,2] | A[0,3] | A[1,0] | A[1,1] | A[1,2] | A[1,3] |
KEM1024 | keygen | 2 | A[2,0] | A[2,1] | A[2,2] | A[2,3] | A[3,0] | A[3,1] | A[3,2] | A[3,3] |
KEM1024 | keygen | 3 | s[0] | s[1] | s[2] | s[3] | e[0] | e[1] | e[2] | e[3] |
KEM1024 | encrypt | 1 | A[0,0] | A[1,0] | A[2,0] | A[3,0] | A[0,1] | A[1,1] | A[2,1] | A[3,1] |
KEM1024 | encrypt | 2 | A[0,2] | A[1,2] | A[2,2] | A[3,2] | A[0,3] | A[1,3] | A[2,3] | A[3,3] |
KEM1024 | encrypt | 3 | r[0] | r[1] | r[2] | r[3] | e1[0] | e1[1] | e1[2] | e1[3] |
KEM1024 | encrypt | 4 | e2 | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Notes:
The Neon key generation and encryption functions perform coefficient sampling in stages and use the 64-bit lanes of 25 128-bit registers to permute and sample from 2 SHAKE128 or SHAKE256 contexts at once.
Here are how the lanes of the 128-bit Neon registers are used for each parameter set, function, and stage:
Parameter Set | Function | Stage | Lane 0 | Lane 1 |
---|---|---|---|---|
KEM512 | keygen | 1 | A[0,0] | A[0,1] |
KEM512 | keygen | 2 | A[1,0] | A[1,1] |
KEM512 | keygen | 3 | s[0] | s[1] |
KEM512 | keygen | 4 | e[0] | e[1] |
KEM512 | encrypt | 1 | A[0,0] | A[1,0] |
KEM512 | encrypt | 2 | A[0,1] | A[1,1] |
KEM512 | encrypt | 3 | r[0] | r[1] |
KEM512 | encrypt | 4 | e1[0] | e1[1] |
KEM512 | encrypt | 5 | e2 | n/a |
KEM768 | keygen | 1 | A[0,0] | A[0,1] |
KEM768 | keygen | 2 | A[0,2] | A[1,0] |
KEM768 | keygen | 3 | A[1,1] | A[1,2] |
KEM768 | keygen | 4 | A[2,0] | A[2,1] |
KEM768 | keygen | 5 | A[2,2] | n/a |
KEM768 | keygen | 6 | s[0] | s[1] |
KEM768 | keygen | 7 | s[2] | e[0] |
KEM768 | keygen | 8 | e[1] | e[2] |
KEM768 | encrypt | 1 | A[0,0] | A[1,0] |
KEM768 | encrypt | 2 | A[2,0] | A[0,1] |
KEM768 | encrypt | 3 | A[1,1] | A[2,1] |
KEM768 | encrypt | 4 | A[0,2] | A[1,2] |
KEM768 | encrypt | 5 | A[2,2] | n/a |
KEM768 | encrypt | 6 | r[0] | r[1] |
KEM768 | encrypt | 7 | r[2] | e1[0] |
KEM768 | encrypt | 8 | e1[1] | e1[2] |
KEM768 | encrypt | 9 | e2 | n/a |
KEM1024 | keygen | 1 | A[0,0] | A[0,1] |
KEM1024 | keygen | 2 | A[0,2] | A[0,3] |
KEM1024 | keygen | 3 | A[1,0] | A[1,1] |
KEM1024 | keygen | 4 | A[1,2] | A[1,3] |
KEM1024 | keygen | 5 | A[2,0] | A[2,1] |
KEM1024 | keygen | 5 | A[2,2] | A[2,3] |
KEM1024 | keygen | 6 | A[3,0] | A[3,1] |
KEM1024 | keygen | 7 | A[3,2] | A[3,3] |
KEM1024 | keygen | 8 | s[0] | s[1] |
KEM1024 | keygen | 9 | s[2] | s[3] |
KEM1024 | keygen | 10 | e[0] | e[1] |
KEM1024 | keygen | 11 | e[2] | e[3] |
KEM1024 | encrypt | 1 | A[0,0] | A[1,0] |
KEM1024 | encrypt | 2 | A[2,0] | A[3,0] |
KEM1024 | encrypt | 3 | A[0,1] | A[1,1] |
KEM1024 | encrypt | 4 | A[2,1] | A[3,1] |
KEM1024 | encrypt | 5 | A[0,2] | A[1,2] |
KEM1024 | encrypt | 5 | A[2,2] | A[3,2] |
KEM1024 | encrypt | 6 | A[0,3] | A[1,3] |
KEM1024 | encrypt | 7 | A[2,3] | A[3,3] |
KEM1024 | encrypt | 8 | r[0] | r[1] |
KEM1024 | encrypt | 9 | r[2] | r[3] |
KEM1024 | encrypt | 10 | e1[0] | e1[1] |
KEM1024 | encrypt | 11 | e1[2] | e1[3] |
KEM1024 | encrypt | 12 | e2 | n/a |
Notes:
Notes: The scalar backend is the default if no accelerated backend is available. It is not particularly optimized.
As of this writing, Clang does a better job optimizing the scalar backend than GCC (see bench results in examples/3-bench/README.md
).
Copyright 2023-2024 Paul Duncan
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.