On 2/3/22 12:05 PM, Usama Arif wrote:
On 03/02/2022 18:49, Jens Axboe wrote:
On 2/3/22 11:24 AM, Usama Arif wrote:
-static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
+static void io_eventfd_signal(struct io_ring_ctx *ctx)
{
- if (likely(!ctx->cq_ev_fd))
- return false;
+ struct io_ev_fd *ev_fd;
+
+ rcu_read_lock();
+ /* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */
+ ev_fd = rcu_dereference(ctx->io_ev_fd);
+
+ if (likely(!ev_fd))
+ goto out;
if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
- return false;
- return !ctx->eventfd_async || io_wq_current_is_worker();
+ goto out;
+
+ if (!ctx->eventfd_async || io_wq_current_is_worker())
+ eventfd_signal(ev_fd->cq_ev_fd, 1);
+
+out:
+ rcu_read_unlock();
}
This still needs what we discussed in v3, something ala:
/*
* This will potential race with eventfd registration, but that's
* always going to be the case if there is IO inflight while an eventfd
* descriptor is being registered.
*/
if (!rcu_dereference_raw(ctx->io_ev_fd))
return;
rcu_read_lock();
Hmm, so i am not so worried about the registeration, but actually
worried about unregisteration.
If after the check and before the rcu_read_lock, the eventfd is
unregistered won't we get a NULL pointer exception at
eventfd_signal(ev_fd->cq_ev_fd, 1)?
You need to check it twice, that's a hard requirement. The first racy
check is safe because we don't care if we miss a notification, once
inside rcu_read_lock() it needs to be done properly of course. Like you
do below, that's how it should be done.
I wonder if we can get away with assigning ctx->io_ev_fd to NULL when we
do the call_rcu(). The struct itself will remain valid as long as we're
under rcu_read_lock() protection, so I think we'd be fine? If we do
that, then we don't need any rcu_barrier() or synchronize_rcu() calls,
as we can register a new one while the previous one is still being
killed.
Hmm?
We would have to remove the check that ctx->io_ev_fd != NULL. That we
would also result in 2 successive calls to io_eventfd_register without
any unregister in between being successful? Which i dont think is the
right behaviour?
I think the likelihood of hitting the rcu_barrier itself is quite low,
so probably the cost is low as well.
Yeah it might very well be. To make what I suggested work, we'd need a
way to mark the io_ev_fd as going away. Which would be feasible, as we
know the memory will remain valid for us to check. So it could
definitely work, you'd just need a check for that.
Thanks, will do that this in the next patchset with the above
io_eventfd_signal changes if those look ok as well?
The code you pasted looked good. Consider the "is unregistration in
progress" suggestion as well, as it would be nice to avoid any kind of
rcu synchronization if at all possible.