Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user

From: K Prateek Nayak
Date: Tue Apr 15 2025 - 07:15:36 EST


Hello Jan,

On 4/15/2025 3:51 PM, Jan Kiszka wrote:
Is this in line with what you are seeing?


Yes, and if you wait a bit longer for the second reporting round, you
should get more task backtraces as well.

So looking at the backtrace [1], Aaron's patch should help with the
stalls you are seeing.

timerfd that queues a hrtimer also uses ep_poll_callback() to wakeup
the epoll waiter which queues ahead of the bandwidth timer and
requires the read lock but now since the writer tried to grab the
lock pushing readers on the slowpath. if epoll-stall-writer is now
throttled, it needs ktimer to replenish its bandwidth which cannot
happen without it grabbing the read lock first.

# epoll-stall-writer

ep_poll()
{
...
/*
* Does not disable IRQ / preemption on PREEMPT_RT; sends future readers on
* rwlock slowpath and they have to wait until epoll-stall-writer acquires
* and drops the write lock.
*/
write_lock_irq(&ep->lock);

__set_current_state(TASK_INTERRUPTIBLE);

/************** Preempted due to lack of bandwidth **************/

...
eavail = ep_events_available(ep);
if (!eavail)
__add_wait_queue_exclusive(&ep->wq, &wait);

/* Never reaches here waiting for bandwidth */
write_unlock_irq(&ep->lock);
}


# ktimers

ep_poll_callback(...)
{
...

/*
* Does not disable interrupts on PREEMPT_RT; ktimers needs the
* epoll-stall-writer to take the write lock and drop it to
* proceed but epoll-stall-writer requires ktimers to run the
* bandwidth timer to be runnable again. Deadlock!
*/
read_lock_irqsave(&ep->lock, flags);

...

/* wakeup within read side critical section */
if (sync)
wake_up_sync(&ep->wq);
else
wake_up(&ep->wq);

...

read_unlock_irqrestore(&ep->lock, flags);
}

[1] https://lore.kernel.org/all/62304351-7fc0-48b6-883b-d346886dac8e@xxxxxxx/


Jan


--
Thanks and Regards,
Prateek