Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

designing a suckless bignum library

Name: Anonymous 2015-11-16 22:11

Let's design a suckless bignum library. (I'm not part of suckless though, just curious about replacing GMP).

I researched a bit into algorithms and the rundown is this:
* long multiplication: O(n^2)
* karatsuba O(n^1.5)
* Toom-Cook, fourier transform based methods - even faster but only used for numbers 10k digits+ long. Much more complex.

So we should probably use karatsuba for all multiplications. Squaring can be done a bit faster than multiplying two different numbers sometimes.

Now I suggest programming it in assembly, that gives you access to the carry bit (C doesn't get you that). Of course we will use libc and the normal C calling conventions so that it's a regular C library.

What to do about memory management? e.g. if you want to add two numbers do we need to allocate a new 'number' as long as the largest to write the result into or do it destructively "x <- x + y"? Maybe the library should support both - then a calculator program would figure out the best primitives to use for a given computation.

It might be nice to also support things like (big modulus) modular arithmetic and polynomials. stuff like exponentiation and modular inverses have interesting algorithms.

What other integer operations would we want? I don't really want to do anything with arb. prec. real numbers - arithmetic with rationals could be done though.

Name: Anonymous 2015-11-20 16:39

op, if you're serious about this, do:

* modular arith only (ie, crypto only). for fast infinite precision, you really need hairy ball of code like GMP, no way to be suckless
* shift-and-add multiplier, then you get modulo for free
* Why not schoolbook & barrett reduction? Sure it works better on superscalar CPUs with ALUs, but it *will* suck horribly on a CPU without multiplier (which is everything really low power these days)
* basics, ie modular add,sub,mul,pow are ~500loc. prime tester (miller-rabin) 100 loc. other bells and whistles (ie actual PKCS+RSA, DHE, maybe even ECDHE) 1kloc. 2kloc all.

What to do about memory management?

The golden rule of suckless C - just forget dynamic memory. Dynamically allocated memory was a mistake. K&R were right. You free memory through return or exit(). Just cap your bignum type to some hardcoded limb count. Sacrifice a bit of stack depth, and and no need for silly realloc() every time you add two numbers. Not to mention then you don't even need libc and can run baremetal easily.

As an example of bignum of this style, here is 60 loc RSA signature verifier for a (very) ROM space constrained embedded MCU bootloader:

#define RSA_BYTES (RSA_BITS/8)
#define DIGIT uint32_t
#define DIGIT_BYTES (sizeof(DIGIT))
#define DIGIT_BITS (DIGIT_BYTES*8)
#define DIGIT_MAX ((DIGIT)(-1))
#define NDIGITS (RSA_BITS/DIGIT_BITS)

static int b_add(DIGIT * restrict r, const DIGIT *x, const DIGIT *y)
{
DIGIT w, carry = 0;
for (int i = 0; i < NDIGITS; i++) {
if ((w = x[i] + carry) < carry)
w = y[i];
else
carry = ((w += y[i]) < y[i]);
r[i] = w;
}
return carry;
}

static int b_mulmod(DIGIT * restrict res, const DIGIT *xsrc, const DIGIT *y, const DIGIT *mod)
{
DIGIT rbuf1[NDIGITS], rbuf2[NDIGITS], xbuf1[NDIGITS], xbuf2[NDIGITS];

DIGIT *r1 = rbuf1;
DIGIT *r2 = rbuf2;
DIGIT *x1 = xbuf1;
DIGIT *x2 = xbuf2;

DIGIT *swp;

memset(rbuf1, 0, sizeof rbuf1);
memcpy(xbuf1, xsrc, sizeof xbuf1);

for (int i = 0; i < NDIGITS; i++) {
for (DIGIT bit = 1; bit; bit += bit) {
if (y[i] & bit) {
if (b_add(r2, r1, x1))
return -1;
if (!b_add(r1, r2, mod)) {
swp = r1;
r1 = r2;
r2 = swp;
}
}
if (b_add(x2, x1, x1))
return -1;

if (!b_add(x1, x2, mod)) {
swp = x1;
x1 = x2;
x2 = swp;
}
}
}
memcpy(res, r1, sizeof rbuf1);
return 0;
}

static int rsa_public(DIGIT *output, const DIGIT *input, const DIGIT *modulus)
{
DIGIT buf2[NDIGITS];
/* buf2 = buf^2 % modulus */
if (b_mulmod(buf2, input, input, modulus))
return -1;
/* buf3 = buf^3 % modulus */
return b_mulmod(output, buf2, input, modulus);
}


(modulus is stored negated, thus no need for b_sub)

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List