Re: [PATCH 3/4] perf: Fix event leak upon exit

From: Frederic Weisbecker
Date: Thu May 16 2024 - 07:17:27 EST


On Thu, May 16, 2024 at 11:05:29AM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2024 at 04:43:10PM +0200, Frederic Weisbecker wrote:
> > When a task is scheduled out, pending sigtrap deliveries are deferred
> > to the target task upon resume to userspace via task_work.
> >
> > However failures while adding en event's callback to the task_work
> > engine are ignored. And since the last call for events exit happen
> > after task work is eventually closed, there is a small window during
> > which pending sigtrap can be queued though ignored, leaking the event
> > refcount addition such as in the following scenario:
> >
> > TASK A
> > -----
> >
> > do_exit()
> > exit_task_work(tsk);
> >
> > <IRQ>
> > perf_event_overflow()
> > event->pending_sigtrap = pending_id;
> > irq_work_queue(&event->pending_irq);
> > </IRQ>
> > =========> PREEMPTION: TASK A -> TASK B
> > event_sched_out()
> > event->pending_sigtrap = 0;
> > atomic_long_inc_not_zero(&event->refcount)
> > // FAILS: task work has exited
> > task_work_add(&event->pending_task)
> > [...]
> > <IRQ WORK>
> > perf_pending_irq()
> > // early return: event->oncpu = -1
> > </IRQ WORK>
> > [...]
> > =========> TASK B -> TASK A
> > perf_event_exit_task(tsk)
> > perf_event_exit_event()
> > free_event()
> > WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1)
> > // leak event due to unexpected refcount == 2
> >
> > As a result the event is never released while the task exits.
>
> Urgh...
>
> >
> > Fix this with appropriate task_work_add()'s error handling.
> >
> > Fixes: 517e6a301f34 ("perf: Fix perf_pending_task() UaF")
> > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > ---
> > kernel/events/core.c | 9 +++++----
> > 1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 724e6d7e128f..c1632e69c69d 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2289,10 +2289,11 @@ event_sched_out(struct perf_event *event, struct perf_event_context *ctx)
> > event->pending_sigtrap = 0;
> > if (state != PERF_EVENT_STATE_OFF &&
> > !event->pending_work) {
> > - event->pending_work = 1;
> > - dec = false;
> > - WARN_ON_ONCE(!atomic_long_inc_not_zero(&event->refcount));
> > - task_work_add(current, &event->pending_task, TWA_RESUME);
> > + if (task_work_add(current, &event->pending_task, TWA_RESUME) >= 0) {
>
> AFAICT the thing is a return 0 on success -Efoo on fail, no? That is,
> should this not simply be '== 0' ?

Right.

>
> > + WARN_ON_ONCE(!atomic_long_inc_not_zero(&event->refcount));
> > + dec = false;
> > + event->pending_work = 1;
> > + }
>
> Also, do we want to write it like so and save an indent?
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2288,11 +2288,11 @@ event_sched_out(struct perf_event *event
>
> event->pending_sigtrap = 0;
> if (state != PERF_EVENT_STATE_OFF &&
> - !event->pending_work) {
> + !event->pending_work &&
> + !task_work_add(current, &event->pending_task, TWA_RESUME)) {
> event->pending_work = 1;
> dec = false;
> WARN_ON_ONCE(!atomic_long_inc_not_zero(&event->refcount));
> - task_work_add(current, &event->pending_task, TWA_RESUME);
> }
> if (dec)
> local_dec(&event->ctx->nr_pending);

Looks good, I'm resending this one patch.

Thanks.