Re: Updated scalable urandom patchkit

From: George Spelvin
Date: Tue Oct 13 2015 - 02:24:54 EST


> We're really getting into low-level implementations here, and I think
> it's best to worry about these sorts of things when we have a patch to
> review.....

> it's probably safer to simply not exit it.

> I'm personally more inclined to keep it with the task struct, so that
> different threads will use different crypto contexts, just from
> simplicity point of view since we won't need to worry about locking.

> Once we code it up we can see how many bytes this takes, we can have
> this discussion.

I'm fine with all of these; thank you.

> This is one of the reasons why I looked at, and then discarded, to use
> x86 accelerated AES as the basis for a CRNG. Setting up AES so that
> it can be used easily with or without hardware acceleration looks very
> complicated to do in a cross-architectural way, and

I haven't looked as deeply, but it didn't look too hard. Is it possible
to briefly explain the problem?

I assumed you'd have an arch-specific capabilities probe function that
would set up an operations structure. That would provide the various
buffer sizes required, and setup (kernel_fpu_begin() and key scheduling)
CPRNG core, and teardown (kernel_fpu_end()) functions.

It there some huge gotcha I'm overlooking?

> I don't want to drag in all of the crypto layer for /dev/random.

Oh, gods, no; the crypto layer drives me nuts. Truthfully, the main
hair of the crypto layer is all the modular cipher modes on top of block
ciphers, and the scatterlist stuff to handle arbitrarily fragmented
input and output buffers for the benefit of the network layer, but the
code is horrible reading.

Every time I look, I find something I want to fix (the CTS mode
implementation uses 6 blocks worth of stack buffer; I have a patch to
reduce that to 3) but then I get lost is the morass of structures and
wrappers trying to make the code fit in with the rest.

There's a struct crypto_tfm, crypto_alg, crypto_instance, cipher_desc,
crypto_type, crypto_template, crypto_spawn... I've been trying to read it
and I still have no idea what half of them are for.

And I *still* haven't figured out how to get the self-test code to tell
me that test X was performed and passed. I ended up writing my own test,
which seems wrong.

> ... and I've thought about this as being the first step towards
> potentially replacing SHA1 with something ChaCha20 based, in light of
> the SHAppening attack. Unfortunately, BLAKE2s is similar to ChaCha
> only from design perspective, not an implementation perspective.
> Still, I suspect the just looking at the crypto primitives, even if we
> need to include two independent copies of the ChaCha20 core crypto and
> the Blake2s core crypto, it still should be about half the size of the
> SHA-1 crypto primitive.

Well, the SHAppening doesn't really change much except a slight schedule
tweak, but yeah, it's one of those things that would be nice to get
around to.

I'm not sure what you expect to do with ChaCha, though; it's really
an expansion function, not compression, and not easily adapted to be one.

BLAKE2 is a bit ugly. I'm generally not liking MD5/SHA-like designs
that dribble the message and some round constants in; I'm much preferring
the large-state "add a bunch of input all at once" designs like Keccak,
SipHash and DJB's SHA-3 entry, CubeHash.

Have you seen it? It's quite similar to Keccak, just using a 32x32 = 1024
bit state rather than Keccak's 25*64=1600.

The reason it got dumped is because, like Keccak, to get n bits of
preimage resistance, it requires 2n bits of "capacity" bits unused each
round. When you ask for 512 bits of preimage resistance, you can
only import a few bits of message each block.

Keccak has the same problem, but it has a big enough block that it can handle
it.

In Dan's submission, you'll see his usual "this is a stupid request
which I'm only paying lip service to", and his "SHA-3-512-formal"
proposal was dog-slow. To quote:

The "SHA-3-512-formal" proposal is aimed at users who are
(1) concerned with attacks using 2^384 operations,
(2) unconcerned with quantum attacks that cost far less, and
(3) unaware that attackers able to carry out 2^256 operations
would wreak havoc on the entire SHA-3 landscape, forcing SHA-3
to be replaced no matter which function is selected as SHA-3.
The "SHA-3-512-normal" proposal is aimed at sensible users.
For all real-world cryptographic applications, the "formal"
versions here can be ignored, and the tweak amounts to a proposal
of CubeHash16/32 as SHA-3."

If NIST had proposed changing the preimage resistance rules *before*
the final decision, things would have gone a lot differently.

> And from the non-plumbing side of things, Andi's patchset increases
> the size of /dev/random by a bit over 6%, or 974 bytes from a starting
> base of 15719 bytes. It ought to be possible to implement a ChaCha20
> based CRNG (ignoring the crypto primitives) in less than 974 bytes of
> x86_64 assembly. :-)

Yes, not hard. Are you inspired, or would you like me to put together
a patch?

And should I do moderately evil space-saving things like store the
pointer to the crypto state and the leaky bucket in the same task slot,
distinguished by the pointer lsbit?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/