Re: Problem with io_uring splice and KTLS

From: Jens Axboe
Date: Mon Oct 16 2023 - 09:17:34 EST


On 10/16/23 1:26 AM, Sascha Hauer wrote:
> On Fri, Oct 13, 2023 at 07:45:55AM -0600, Jens Axboe wrote:
>> On 10/12/23 11:47 PM, Sascha Hauer wrote:
>>> On Thu, Oct 12, 2023 at 07:45:07PM -0600, Jens Axboe wrote:
>>>> On 10/12/23 7:34 AM, Sascha Hauer wrote:
>>>>> In case you don't have encryption hardware you can create an
>>>>> asynchronous encryption module using cryptd. Compile a kernel with
>>>>> CONFIG_CRYPTO_USER_API_AEAD and CONFIG_CRYPTO_CRYPTD and start the
>>>>> webserver with the '-c' option. /proc/crypto should then contain an
>>>>> entry with:
>>>>>
>>>>> name : gcm(aes)
>>>>> driver : cryptd(gcm_base(ctr(aes-generic),ghash-generic))
>>>>> module : kernel
>>>>> priority : 150
>>>>
>>>> I did a bit of prep work to ensure I had everything working for when
>>>> there's time to dive into it, but starting it with -c doesn't register
>>>> this entry. Turns out the bind() in there returns -1/ENOENT.
>>>
>>> Yes, that happens here as well, that's why I don't check for the error
>>> in the bind call. Nevertheless it has the desired effect that the new
>>> algorithm is registered and used from there on. BTW you only need to
>>> start the webserver once with -c. If you start it repeatedly with -c a
>>> new gcm(aes) instance is registered each time.
>>
>> Gotcha - I wasn't able to trigger the condition, which is why I thought
>> perhaps I was missing something.
>>
>> Can you try the below patch and see if that makes a difference? I'm not
>> quite sure why it would since you said it triggers with DEFER_TASKRUN as
>> well, and for that kind of notification, you should never hit the paths
>> you have detailed in the debug patch.
>
> I can confirm that this patch makes it work for me. I tested with both
> software cryptd and also with my original CAAM encryption workload.
> IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN is not needed.
> Both my simple webserver and the original C++ Webserver from our
> customer are now working without problems.

OK, good to hear. I'm assuming you only change for
sk_stream_wait_memory()? If you can reproduce, would be good to test.
But i general none of them should hurt.

FWIW, the reason why DEFER_TASKRUN wasn't fully solving it is because
we'd also use TIF_NOTIFY_SIGNAL for creating new io-wq workers. So while
task_work would not be the trigger for setting that condition, we'd
still end up doing it via io-wq worker creation.

> Do you think there is a chance getting this change upstream? I'm a bit
> afraid the code originally uses signal_pending() instead of
> task_sigpending() for a good reason.

The distinction between signal_pending() and task_sigpending() was
introduced with TIF_NOTIFY_SIGNAL. This isn't a case of networking
needing to use signal_pending(), just that this is was originally the
only aborting condition and now it's a bit too broad for some cases
(like this one).

--
Jens Axboe