Re: [PATCH v4 1/4] crypto: Introduce crypto_pool

From: Dmitry Safonov
Date: Thu Jan 19 2023 - 13:03:59 EST


Hi Herbert,

On 1/19/23 09:51, Herbert Xu wrote:
> On Wed, Jan 18, 2023 at 09:41:08PM +0000, Dmitry Safonov wrote:
>> Introduce a per-CPU pool of async crypto requests that can be used
>> in bh-disabled contexts (designed with net RX/TX softirqs as users in
>> mind). Allocation can sleep and is a slow-path.
>> Initial implementation has only ahash as a backend and a fix-sized array
>> of possible algorithms used in parallel.
>>
>> Signed-off-by: Dmitry Safonov <dima@xxxxxxxxxx>
>> ---
>> crypto/Kconfig | 3 +
>> crypto/Makefile | 1 +
>> crypto/crypto_pool.c | 333 ++++++++++++++++++++++++++++++++++++++++++
>> include/crypto/pool.h | 46 ++++++
>> 4 files changed, 383 insertions(+)
>> create mode 100644 crypto/crypto_pool.c
>> create mode 100644 include/crypto/pool.h
>
> I'm still nacking this.
>
> I'm currently working on per-request keys which should render
> this unnecessary. With per-request keys you can simply do an
> atomic kmalloc when you compute the hash.

Adding per-request keys sounds like a real improvement to me.
But that is not the same issue I'm addressing here. I'm maybe bad at
describing or maybe I just don't see how per-request keys would help.
Let me describe the problem I'm solving again and please feel free to
correct inline or suggest alternatives.

The initial need for crypto_pool comes from TCP-AO implementation that
I'm pusing upstream, see RFC5925 that describes the option and the
latest version of patch set is in [1]. In that patch set hashing is used
in a similar way to TCP-MD5: crypto_alloc_ahash() is a slow-path in
setsockopt() and the use of pre-allocated requests in fast path, TX/RX
softirqs.

For TCP-AO 2 algorithms are "must have" in any compliant implementation,
according to RFC5926: HMAC-SHA-1-96 and AES-128-CMAC-96, other
algorithms are optional. But having in mind that sha1, as you know, is
not secure to collision attacks, some customers prefer to have/use
stronger hashes. In other words, TCP-AO implementation needs 2+ hashing
algorithms to be used in a similar manner as TCP-MD5 uses MD5 hashing.

And than, I look around and I see that the same pattern (slow allocation
of crypto request and usage on a fast-path with bh disabled) is used in
other places over kernel:
- here I convert to crypto_pool seg6_hmac & tcp-md5
- net/ipv4/ah4.c could benefit from it: currently it allocates
crypto_alloc_ahash() per every connection, allocating user-specified
hash algorithm with ahash = crypto_alloc_ahash(x->aalg->alg_name, 0, 0),
which are not shared between each other and it doesn't provide
pre-allocated temporary/scratch buffer to calculate hash, so it uses
GFP_ATOMIC in ah_alloc_tmp()
- net/ipv6/ah6.c is copy'n'paste of the above
- net/ipv4/esp4.c and net/ipv6/esp6.c are more-or-less also copy'n'paste
with crypto_alloc_aead() instead of crypto_alloc_ahash()
- net/mac80211/ - another example of the same pattern, see even the
comment in ieee80211_key_alloc() where the keys are allocated and the
usage in net/mac80211/{rx,tx}.c with bh-disabled
- net/xfrm/xfrm_ipcomp.c has its own manager for different compression
algorithms that are used in quite the same fashion. The significant
exception is scratch area: it's IPCOMP_SCRATCH_SIZE=65400. So, if it
could be shared with other crypto users that do the same pattern
(bh-disabled usage), it would save some memory.

And those are just fast-grep examples from net/, looking closer it may
be possible to find more potential users.
So, in crypto_pool.c it's 333 lines where is a manager that let a user
share pre-allocated ahash requests [comp, aead, may be added on top]
inside bh-disabled section as well as share a temporary/scratch buffer.
It will make it possible to remove some if not all custom managers of
the very same code pattern, some of which don't even try to share
pre-allocated tfms.

That's why I see some value in this crypto-pool thing.
If you NACK it, the alternative for TCP-AO patches would be to add just
another pool into net/ipv4/tcp.c that either copies TCP-MD5 code or
re-uses it.

I fail to see how your per-request keys patches would provide an API
alternative to this patch set. Still, users will have to manage
pre-allocated tfms and buffers.
I can actually see how your per-request keys would benefit *from* this
patch set: it will be much easier to wire per-req keys up to crypto_pool
to avoid per-CPU tfm allocation for algorithms you'll add support for.
In that case you won't have to patch crypto-pool users.

[1]:
https://lore.kernel.org/all/20221027204347.529913-1-dima@xxxxxxxxxx/T/#u

Thanks, waiting for your input,
Dmitry