Re: [RFC 4/4] io_uring: implement futex wait
From: Thomas Gleixner
Date: Tue Jun 01 2021 - 17:53:06 EST
On Tue, Jun 01 2021 at 17:29, Pavel Begunkov wrote:
> On 6/1/21 5:01 PM, Jens Axboe wrote:
>>> Yes, that would be preferable, but looks futexes don't use
>>> waitqueues but some manual enqueuing into a plist_node, see
>>> futex_wait_queue_me() or mark_wake_futex().
>>> Did I miss it somewhere?
>> Yes, we'd need to augment that with a callback. I do think that's going
> Yeah, that was the first idea, but it's also more intrusive for the
> futex codebase. Can be piled on top for next revision of patches.
> A question to futex maintainers, how much resistance to merging
> something like that I may expect?
Adding a waitqueue like callback or the proposed thing?
TBH. Neither one has a charm.
1) The proposed solution: I can't figure out from the changelogs or the
cover letter what kind of problems it solves and what the exact
semantics are. If you ever consider to submit futex patches, may I
recommend to study Documentation/process and get some inspiration
What are the lifetime rules, what's the interaction with regular
futexes, what's the interaction with robust list ....? Without
interaction with regular futexes such a functionality does not make
any sense at all.
Also once we'd open that can of worms where is this going to end and
where can we draw the line? This is going to be a bottomless pit
because I don't believe for a split second that this simple interface
is going to be sufficient.
Aside of that we are definitely _not_ going to expose any of the
internal functions simply because they evade any sanity check which
happens at the syscall wrappers and I have zero interest to deal with
the fallout of unfiltered input which comes via io-uring interfaces
or try to keep those up to date when the core filtering changes.
2) Adding a waitqueue like callback is daft.
a) Lifetime rules
The wakeup mechanism is designed to avoid hb->lock contention as much
as possible. The dequeue/mark for wakeup happens under hb->lock
and the actual wakeup happens after dropping hb->lock.
This is not going to change. It's not even debatable.
Aside of that this is optimized for minimal hb->lock hold time in
So the only way to do that would be to invoke the callback from
mark_wake_futex() _before_ invalidating futex_q and the callback plus
the argument(s) would have to be stored in futex_q.
Where does this information come from? Which context would invoke the
wait with callback and callback arguments? User space, io-uring state
machine or what?
Aside of the extra storage (on stack) and yet more undefined life
time rules and no semantics for error cases etc., that also would
enforce that the callback is invoked with hb->lock held. IOW, it's
making the hb->lock held time larger, which is exactly what the
existing code tries to avoid by all means.
But what would that solve?
I can't tell because the provided information is absolutely useless
for anyone not familiar with your great idea:
"Add futex wait requests, those always go through io-wq for
What am I supposed to read out of this? Doing it elsewhere would be
more complex? Really useful information.
And I can't tell either what Jens means here:
"Not a huge fan of that, I think this should tap into the waitqueue
instead and just rely on the wakeup callback to trigger the
event. That would be a lot more efficient than punting to io-wq, both
in terms of latency on trigger, but also for efficiency if the app is
waiting on a lot of futexes."
"Yes, we'd need to augment that with a callback. I do think that's
going to be necessary, I don't see the io-wq solution working well
outside of the most basic of use cases. And even for that, it won't
be particularly efficient for single waits."
All of these quotes are useless word salad without context and worse
without the minimal understanding how futexes work.
So can you folks please sit down and write up a coherent description of:
1) The problem you are trying to solve
2) How this futex functionality should be integrated into io-uring
including the contexts which invoke it.
3) Interaction with regular sys_futex() operations.
4) Lifetime and locking rules.
Unless that materializes any futex related changes are not even going to
I did not even try to review this stuff, I just tried to make sense out
of it, but while skimming it, it was inevitable to spot this gem:
+int futex_wake_op_single(u32 __user *uaddr, int nr_wake, unsigned int op,
+ bool shared, bool try);
+ ret = futex_wake_op_single(f->uaddr, f->nr_wake, f->wake_op_arg,
+ !(f->flags & IORING_FUTEX_SHARED),
You surely made your point that this is well thought out.