Re: [GIT PULL] pipe: nonblocking rw for io_uring

From: Jens Axboe
Date: Mon Apr 24 2023 - 17:22:12 EST


On 4/24/23 3:05?PM, Linus Torvalds wrote:
> On Fri, Apr 21, 2023 at 7:02?AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>>
>> This contains Jens' work to support FMODE_NOWAIT and thus IOCB_NOWAIT
>> for pipes ensuring that all places can deal with non-blocking requests.
>>
>> To this end, pass down the information that this is a nonblocking
>> request so that pipe locking, allocation, and buffer checking correctly
>> deal with those.
>
> Ok, I pulled this, but then I unpulled it again.
>
> Doing conditional locking for O_NONBLOCK and friends is not ok. Yes,
> it's been done, and I may even have let some slip through, but it's
> just WRONG.
>
> There is absolutely no situation where a "ok, so the lock on this data
> structure was taken, we'll go to some async model" is worth it.
>
> Every single time I've seen this, it's been some developer who thinks
> that O_NONBLOCk is somehow some absolute "this cannot schedule" thing.
> And every single time it's been broken and horrid crap that just made
> everything more complicated and slowed things down.
>
> If some lock wait is a real problem, then the lock needs to be just
> fixed. Not "ok, let's make a non-blocking version and fall back if
> it's held".
>
> Note that FMODE_NOWAIT does not mean (and *CANNOT* mean) that you'd
> somehow be able to do the IO in some atomic context anyway. Many of
> our kernel locks don't even support that (eg mutexes).
>
> So thinking that FMODE_NOWAIT is that kind of absolute is the wrong
> kind of thinking entirely.
>
> FMODE_NOWAIT should mean that no *IO* gets done. And yes, that might
> mean that allocations fail too. But not this kind of "let's turn
> locking into 'trylock' stuff".
>
> The whoe flag is misnamed. It should have been FMODE_NOIO, the same
> way we have IOCB_NOIO.
>
> If you want FMODE_ATOMIC, then that is something entirely and
> completely different, and is probably crazy.
>
> We have done it in one area (RCU pathname lookup), and it was worth it
> there, and it was a *huge* undertaking. It was worth it, but it was
> worth it because it was a serious thing with some serious design and a
> critical area.
>
> Not this kind of "conditional trylock" garbage which just means that
> people will require 'poll()' to now add the lock to the waitqueue, or
> make all callers go into some "let's use a different thread instead"
> logic.

I'm fully agreeing with you on the semantics for
FMODE_NOWAIT/IOCB_NOWAIT, and also agree that they are misnamed and
really should be _NOIO. There is no conceptual misunderstanding there,
nor do I care about atomic semantics. Obviously the above is for
io_uring to be able to improve the performance on pipes, because right
now everything is punted to io-wq and that severly hampers performance
with how pipes are generally used. io_uring doesn't care about atomic
context, but it very much cares about NOT sleeping for IO during issue
as that has adverse performance implications.

If we don't ever wait for IO with the pipe lock held, then we can skip
the conditional locking. But with splice, that's not at all the case! We
most certainly wait for IO there with the pipe lock held.

And yes I totally agree that conditional locking is not pretty, but for
some cases it really is a necessary evil. People have complained about
io_uring pipe performance, and I ran some testing myself. Being able to
sanely read/write from/to pipes without punting to io-wq is an easy 10x
improvement for that use case. How can I enable FMODE_NOWAIT on pipes if
we have splice holding the pipe lock over IO? It's not feasible.

So please reconsider, doing IO to/from pipes efficiently is actually
kind of useful for io_uring...

--
Jens Axboe