fips203ipd
C11 implementation of FIPS 203 initial public draft (IPD).
fips203ipd

Embeddable, dependency-free C11 implementation of ML-KEM from the FIPS 203 initial public draft (IPD) with scalar, AVX-512, and Neon backends.

FIPS 203 is (or will be) NIST's standardized version of Kyber, a post-quantum key encapsulation mechanism (KEM).

Notes on this implementation:

Use make to build a minimal self test application, make doc to build the HTML-formatted API documentation, and make test to run the test suite.

Example

Minimal example of Alice and Bob exchanging a shared secret with KEM512:

//
// hello.c: minimal example of a two parties "alice" and "bob"
// generating a shared secret with KEM512.
//
#include <stdio.h> // fputs()
#include <string.h> // memcmp()
#include "hex.h" // hex_write()
#include "rand-bytes.h" // rand_bytes()
#include "fips203ipd.h" // fips203ipd_*()
int main(void) {
//
// alice: generate keypair
//
uint8_t ek[FIPS203IPD_KEM512_EK_SIZE] = { 0 }; // encapsulation key
uint8_t dk[FIPS203IPD_KEM512_DK_SIZE] = { 0 }; // decapsulation key
{
// alice: get 64 random bytes for keygen()
uint8_t keygen_seed[64] = { 0 };
rand_bytes(keygen_seed, sizeof(keygen_seed));
fputs("alice: keygen random (64 bytes) = ", stdout);
hex_write(stdout, keygen_seed, sizeof(keygen_seed));
fputs("\n", stdout);
// alice: generate encapsulation/decapsulation key pair
fips203ipd_kem512_keygen(ek, dk, keygen_seed);
}
fputs("alice: generated encapsulation key `ek` and decapsulation key `dk`:\n", stdout);
printf("alice: ek (%d bytes) = ", FIPS203IPD_KEM512_EK_SIZE);
hex_write(stdout, ek, sizeof(ek));
printf("\nalice: dk (%d bytes) = ", FIPS203IPD_KEM512_DK_SIZE);
hex_write(stdout, dk, sizeof(dk));
fputs("\n", stdout);
// alice send `ek` to bob
fputs("alice: sending encapsulation key `ek` to bob\n\n", stdout);
//
// bob: generate shared secret and ciphertext
//
uint8_t b_key[32] = { 0 }; // shared secret
uint8_t ct[FIPS203IPD_KEM512_CT_SIZE] = { 0 }; // ciphertext
{
// bob: get 32 random bytes for encaps()
uint8_t encaps_seed[32] = { 0 };
rand_bytes(encaps_seed, sizeof(encaps_seed));
fputs("bob: encaps random (32 bytes) = ", stdout);
hex_write(stdout, encaps_seed, sizeof(encaps_seed));
fputs("\n", stdout);
// bob:
// 1. get encapsulation key `ek` from alice.
// 2. generate random shared secret.
// 3. use `ek` from step #1 to encapsulate the shared secret from step #2.
// 3. store the shared secret in `b_key`.
// 4. store the encapsulated shared secret (ciphertext) in `ct`.
fips203ipd_kem512_encaps(b_key, ct, ek, encaps_seed);
}
fputs("bob: generated secret `b_key` and ciphertext `ct`:\nbob: b_key (32 bytes) = ", stdout);
hex_write(stdout, b_key, sizeof(b_key));
printf("\nbob: ct (%d bytes) = ", FIPS203IPD_KEM512_CT_SIZE);
hex_write(stdout, ct, sizeof(ct));
fputs("\n", stdout);
// bob sends ciphertext `ct` to alice
fputs("bob: sending ciphertext `ct` to alice\n\n", stdout);
//
// alice: decapsulate shared secret
//
// alice:
// 1. get ciphertext `ct` from bob.
// 2. use decapsulation key `dk` to decapsulate shared secret from `ct`.
// 2. store shared secret in `a_key`.
uint8_t a_key[32] = { 0 };
fips203ipd_kem512_decaps(a_key, ct, dk);
fputs("alice: used `dk` to decapsulate secret from `ct` into `a_key`:\nalice: a_key (32 bytes) = ", stdout);
hex_write(stdout, a_key, sizeof(a_key));
fputs("\n\n", stdout);
// check result
// (note: don't use memcmp() in real applications; it's not constant-time)
if (!memcmp(a_key, b_key, sizeof(a_key))) {
// success: alice and bob have the same shared secret
fputs("SUCCESS! alice secret `a_key` and bob secret `b_key` match.\n", stdout);
return 0;
} else {
// failure: alice and bob do not have the same shared secret
fputs("FAILURE! alice secret `a_key` and bob secret `b_key` do not match.\n", stdout);
return -1;
}
}
C11 implementation of ML-KEM from the FIPS 203 initial public draft.
#define FIPS203IPD_KEM512_DK_SIZE
Size of KEM512 decapsulation key, in bytes (768 * K + 96).
Definition: fips203ipd.h:79
#define FIPS203IPD_KEM512_CT_SIZE
Size of KEM512 ciphertext, in bytes (32 * (DU * K + DV)).
Definition: fips203ipd.h:85
#define FIPS203IPD_KEM512_EK_SIZE
Size of KEM512 encapsulation key, in bytes (384 * K + 32).
Definition: fips203ipd.h:73
void fips203ipd_kem512_keygen(uint8_t ek[static FIPS203IPD_KEM512_EK_SIZE], uint8_t dk[static FIPS203IPD_KEM512_DK_SIZE], const uint8_t seed[static FIPS203IPD_KEYGEN_SEED_SIZE])
Generate KEM512 encapsulation key ek and decapsulation key dk from 64 byte random seed seed.
void fips203ipd_kem512_encaps(uint8_t key[static FIPS203IPD_KEY_SIZE], uint8_t ct[static FIPS203IPD_KEM512_CT_SIZE], const uint8_t ek[static FIPS203IPD_KEM512_EK_SIZE], const uint8_t seed[static FIPS203IPD_ENCAPS_SEED_SIZE])
Generate KEM512 shared key key and ciphertext ct from given encapsulation key ek and randomness seed.
void fips203ipd_kem512_decaps(uint8_t key[static FIPS203IPD_KEY_SIZE], const uint8_t ct[static FIPS203IPD_KEM512_CT_SIZE], const uint8_t dk[static FIPS203IPD_KEM512_DK_SIZE])
Decapsulate shared key key from ciphertext ct using KEM512 decapsulation key dk with implicit rejecti...

See examples/0-hello-kem/ for the full buildable example, including a Makefile and support files.

Documentation

API documentation is available online here and also in fips203ipd.h. If you have Doxygen installed, you can build HTML-formatted API documentation by typing make doc.

Tests

Use make test to build and run the test suite.

The test suite checks each component of this implementation for expected answers and is built with common sanitizers supported by both GCC and Clang. The source code for the test suite is embedded at the bottom of fips203ipd.c behind a TEST_FIPS203IPD define.

You can also build a quick test application by typing make in the top-level directory. The test application does the following 1000 times for each parameter set:

  1. Generate a random encapsulation/decapsulation key pair.
  2. Encapsulate a secret using the encapsulation key.
  3. Decapsulate the secret using the decapsulation key.
  4. Verify that the secrets generated in steps #2 and #3 match.

Usage

There are safer and faster alternatives, but if you want to use this library anyway:

  1. Copy fips203ipd.h and fips203ipd.c into your source tree.
  2. Update your build system to compile fips203ipd.o.
  3. Include fips203ipd.h in your application.
  4. Use fips203ipd_*() functions in your code.

Benchmarks

A minimal libcpucycles-based benchmarking tool is available in examples/3-bench/. The bench tool repeatedly measures the number of CPU cycles used for key generation, encapsulation, and decapsulation for each parameter set and then prints summary statistics to standard output in CSV format.

bench results from a few of my systems are shown below.

Lenovo ThinkPad X1 Carbon, 6th Gen (x86-64 i7-1185G7, AVX-512 backend)

set function median mean stddev min max
kem512 keygen 17633 17723 676 17302 54660
kem512 encaps 21602 21705 938 21304 75753
kem512 decaps 25733 25894 4934 25376 372564
kem768 keygen 29384 29528 1557 28885 133739
kem768 encaps 32511 32632 603 31974 59495
kem768 decaps 38176 38223 529 37368 46061
kem1024 keygen 39829 40111 4865 39166 365645
kem1024 encaps 45250 45437 781 44510 76805
kem1024 decaps 52425 52605 636 51282 63121

Raspberry Pi 5 (ARM Cortex-A76, Neon backend)

set function median mean stddev min max
kem512 keygen 53711 53824 945 53355 106488
kem512 encaps 61366 61487 600 61010 70577
kem512 decaps 73559 73666 586 73202 85529
kem768 keygen 92560 92754 896 92026 119126
kem768 encaps 104842 105065 802 104352 114855
kem768 decaps 121485 121718 826 120907 130430
kem1024 keygen 140219 143515 10211 139507 214890
kem1024 encaps 154949 158546 11176 154237 224903
kem1024 decaps 176131 180226 12692 175330 231178

Odroid N2L (ARM Cortex-A73, Neon backend)

set function median mean stddev min max
kem512 keygen 96450 96734 1455 94875 121425
kem512 encaps 107550 107831 1552 106050 132825
kem512 decaps 126375 126681 1704 125250 165450
kem768 keygen 168450 168962 2058 165900 209175
kem768 encaps 186975 187449 2169 184425 214350
kem768 decaps 212925 213580 2769 209550 259725
kem1024 keygen 260325 261156 2861 256275 304950
kem1024 encaps 281175 282064 2891 277575 326850
kem1024 decaps 314250 315165 3441 310875 410850

Backends

This library includes several accelerated backends which are selectable at compile time via the optional BACKEND parameter. By default the fastest available backend is selected at compile-time.

The available backends are:

  • Auto-detect (BACKEND=0): Automatically detect the appropriate backend at compile-time (default).
  • Scalar (BACKEND=1): Fallback if no faster backend is available.
  • AVX-512 (BACKEND=2): AVX-512 acceleration. Selected by default if AVX-512 is supported.
  • Neon (BACKEND=3): ARM Neon acceleration. Selected by default if Neon is supported..

When the a backend is enabled, the following functions are replaced with accelerated equivalents:

Functions Description
permute() block permutation for SHA3-256 and SHA3-512
poly_{ntt,inv_ntt}() Number-Theoretic Transform (NTT) and inverse NTT
poly_{add,add2,sub,mul}() Polynomial arithmetic
poly_encode(), poly_encode_{11,10,5,4,1}bit() Polynomial encoding
poly_decode(), poly_decode_{11,10,5,4,1}bit() Polynomial decoding
pke{512,768,1024}_keygen_sample() Key generation (see below)
pke{512,768,1024}_encrypt_sample() Encryption (see below)

You can use the fips203ipd_backend() function to get the backend name at run-time.

AVX-512 Backend

The AVX-512-enabled key generation and encryption functions perform coefficient sampling in stages and use the 64-bit lanes of 25 512-bit registers to permute and sample from up to 8 SHAKE128 and SHAKE256 contexts at once.

Here are how the lanes of the AVX-512 registers are used for each parameter set, function, and stage:

Parameter Set Function Stage Lane 0 Lane 1 Lane 2 Lane 3 Lane 4 Lane 5 Lane 6 Lane 7
KEM512 keygen 1 A[0,0] A[0,1] A[1,0] A[1,1] s[0] s[1] e[0] e[1]
KEM512 encrypt 1 A[0,0] A[1,0] A[0,1] A[1,1] r[0] r[1] n/a n/a
KEM512 encrypt 2 e1[0] e1[1] e2 n/a n/a n/a n/a n/a
KEM768 keygen 1 A[0,0] A[0,1] A[0,2] A[1,0] A[1,1] A[1,2] A[2,0] A[2,1]
KEM768 keygen 2 A[2,2] s[0] s[1] s[2] e[0] e[1] e[2] n/a
KEM768 encrypt 1 A[0,0] A[1,0] A[2,0] A[0,1] A[1,1] A[2,1] A[0,2] A[1,2]
KEM768 encrypt 2 A[2,2] r[0] r[1] r[2] e1[0] e1[1] e1[2] e2
KEM1024 keygen 1 A[0,0] A[0,1] A[0,2] A[0,3] A[1,0] A[1,1] A[1,2] A[1,3]
KEM1024 keygen 2 A[2,0] A[2,1] A[2,2] A[2,3] A[3,0] A[3,1] A[3,2] A[3,3]
KEM1024 keygen 3 s[0] s[1] s[2] s[3] e[0] e[1] e[2] e[3]
KEM1024 encrypt 1 A[0,0] A[1,0] A[2,0] A[3,0] A[0,1] A[1,1] A[2,1] A[3,1]
KEM1024 encrypt 2 A[0,2] A[1,2] A[2,2] A[3,2] A[0,3] A[1,3] A[2,3] A[3,3]
KEM1024 encrypt 3 r[0] r[1] r[2] r[3] e1[0] e1[1] e1[2] e1[3]
KEM1024 encrypt 4 e2 n/a n/a n/a n/a n/a n/a n/a

Notes:

  • Unlike the Neon backend and most other ML-KEM implementations, the AVX-512 backend does not alternate between squeezing to a buffer and then sampling polynomial coefficients from the buffer. Instead, the AVX-512 backend transposes the registers containing the SHAKE context and then samples from them directly. This aproach is novel, but it uses a lot of registers and probably slower in the common case. A future implementation will probably switch to the alternating method used by the Neon backend and other implementations.

Neon Backend

The Neon key generation and encryption functions perform coefficient sampling in stages and use the 64-bit lanes of 25 128-bit registers to permute and sample from 2 SHAKE128 or SHAKE256 contexts at once.

Here are how the lanes of the 128-bit Neon registers are used for each parameter set, function, and stage:

Parameter Set Function Stage Lane 0 Lane 1
KEM512 keygen 1 A[0,0] A[0,1]
KEM512 keygen 2 A[1,0] A[1,1]
KEM512 keygen 3 s[0] s[1]
KEM512 keygen 4 e[0] e[1]
KEM512 encrypt 1 A[0,0] A[1,0]
KEM512 encrypt 2 A[0,1] A[1,1]
KEM512 encrypt 3 r[0] r[1]
KEM512 encrypt 4 e1[0] e1[1]
KEM512 encrypt 5 e2 n/a
KEM768 keygen 1 A[0,0] A[0,1]
KEM768 keygen 2 A[0,2] A[1,0]
KEM768 keygen 3 A[1,1] A[1,2]
KEM768 keygen 4 A[2,0] A[2,1]
KEM768 keygen 5 A[2,2] n/a
KEM768 keygen 6 s[0] s[1]
KEM768 keygen 7 s[2] e[0]
KEM768 keygen 8 e[1] e[2]
KEM768 encrypt 1 A[0,0] A[1,0]
KEM768 encrypt 2 A[2,0] A[0,1]
KEM768 encrypt 3 A[1,1] A[2,1]
KEM768 encrypt 4 A[0,2] A[1,2]
KEM768 encrypt 5 A[2,2] n/a
KEM768 encrypt 6 r[0] r[1]
KEM768 encrypt 7 r[2] e1[0]
KEM768 encrypt 8 e1[1] e1[2]
KEM768 encrypt 9 e2 n/a
KEM1024 keygen 1 A[0,0] A[0,1]
KEM1024 keygen 2 A[0,2] A[0,3]
KEM1024 keygen 3 A[1,0] A[1,1]
KEM1024 keygen 4 A[1,2] A[1,3]
KEM1024 keygen 5 A[2,0] A[2,1]
KEM1024 keygen 5 A[2,2] A[2,3]
KEM1024 keygen 6 A[3,0] A[3,1]
KEM1024 keygen 7 A[3,2] A[3,3]
KEM1024 keygen 8 s[0] s[1]
KEM1024 keygen 9 s[2] s[3]
KEM1024 keygen 10 e[0] e[1]
KEM1024 keygen 11 e[2] e[3]
KEM1024 encrypt 1 A[0,0] A[1,0]
KEM1024 encrypt 2 A[2,0] A[3,0]
KEM1024 encrypt 3 A[0,1] A[1,1]
KEM1024 encrypt 4 A[2,1] A[3,1]
KEM1024 encrypt 5 A[0,2] A[1,2]
KEM1024 encrypt 5 A[2,2] A[3,2]
KEM1024 encrypt 6 A[0,3] A[1,3]
KEM1024 encrypt 7 A[2,3] A[3,3]
KEM1024 encrypt 8 r[0] r[1]
KEM1024 encrypt 9 r[2] r[3]
KEM1024 encrypt 10 e1[0] e1[1]
KEM1024 encrypt 11 e1[2] e1[3]
KEM1024 encrypt 12 e2 n/a

Notes:

  • Unlike the AVX-512 backend, the Neon backend alternates between squeezing bytes to a buffer and then sampling polynomial coefficients from the buffer.
  • The Neon backend does not use the EOR3 or RAX1 instructions, because I do not have a system which supports them.
  • Polynomial encoding and decoding are not currently Neon-accelerated. This will be fixed in a future release.

Scalar Backend

Notes: The scalar backend is the default if no accelerated backend is available. It is not particularly optimized.

As of this writing, Clang does a better job optimizing the scalar backend than GCC (see bench results in examples/3-bench/README.md).

References

License

MIT No Attribution (MIT-0)

Copyright 2023-2024 Paul Duncan

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.