Re: [PATCH] task_work: Consume only item at a time while invoking the callbacks.

From: Oleg Nesterov
Date: Tue Feb 25 2025 - 11:43:07 EST


On 02/25, Frederic Weisbecker wrote:
>
> Le Sun, Feb 23, 2025 at 11:40:15PM +0100, Oleg Nesterov a écrit :
> >
> > I'll try to find and read the previous discussions tomorrow, but iirc Frederic
> > had another solution?
>
> This:
>
> https://lore.kernel.org/all/Zx-B0wK3xqRQsCOS@localhost.localdomain/
>
> Though I'm not entirely happy with it either.

Yes, thanks...

Can we do something else and avoid this rcuwait_wait_event() altogether?

To simplify the discussion, suppose we add a global XXX_LOCK. Just in
case, of course we shouldn't do this ;) But lets suppose we do.

Now, can _something_ like the (incomplete, ugly as hell) patch below work?

Oleg.
---

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5304,12 +5304,12 @@ static void perf_pending_task_sync(struct perf_event *event)
return;
}

- /*
- * All accesses related to the event are within the same RCU section in
- * perf_pending_task(). The RCU grace period before the event is freed
- * will make sure all those accesses are complete by then.
- */
- rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
+ spin_lock(XXX_LOCK);
+ if (event->pending_work) {
+ local_dec(&event->ctx->nr_no_switch_fast);
+ event->pending_work = -1;
+ }
+ spin_unlock(XXX_LOCK);
}

static void _free_event(struct perf_event *event)
@@ -5369,7 +5369,15 @@ static void _free_event(struct perf_event *event)
exclusive_event_destroy(event);
module_put(event->pmu->module);

- call_rcu(&event->rcu_head, free_event_rcu);
+ bool free = true;
+ spin_lock(XXX_LOCK)
+ if (event->pending_work == -1) {
+ event->pending_work = -2;
+ free = false;
+ }
+ spin_unlock(XXX_LOCK);
+ if (free)
+ call_rcu(&event->rcu_head, free_event_rcu);
}

/*
@@ -6981,7 +6989,14 @@ static void perf_pending_task(struct callback_head *head)
{
struct perf_event *event = container_of(head, struct perf_event, pending_task);
int rctx;
+ bool free = false;

+ spin_lock(XXX_LOCK);
+ if ((int)event->pending_work < 0) {
+ free = event->pending_work == -2u;
+ event->pending_work = 0;
+ goto unlock;
+ }
/*
* All accesses to the event must belong to the same implicit RCU read-side
* critical section as the ->pending_work reset. See comment in
@@ -7004,6 +7019,12 @@ static void perf_pending_task(struct callback_head *head)

if (rctx >= 0)
perf_swevent_put_recursion_context(rctx);
+
+unlock:
+ spin_unlock(XXX_LOCK);
+
+ if (free)
+ call_rcu(&event->rcu_head, free_event_rcu);
}

#ifdef CONFIG_GUEST_PERF_EVENTS