Re: [PATCH v2] workqueue: Fix false positive stall reports

From: Petr Mladek

Date: Tue Mar 24 2026 - 06:05:05 EST

On Mon 2026-03-23 11:31:14, Song Liu wrote:
> On Mon, Mar 23, 2026 at 7:09 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> >
> > On Sat 2026-03-21 20:30:45, Song Liu wrote:
> > > On weakly ordered architectures (e.g., arm64), the lockless check in
> > > wq_watchdog_timer_fn() can observe a reordering between the worklist
> > > insertion and the last_progress_ts update. Specifically, the watchdog
> > > can see a non-empty worklist (from a list_add) while reading a stale
> > > last_progress_ts value, causing a false positive stall report.
> > >
> > > This was confirmed by reading pool->last_progress_ts again after holding
> > > pool->lock in wq_watchdog_timer_fn():
> > >
> > > workqueue watchdog: pool 7 false positive detected!
> > > lockless_ts=4784580465 locked_ts=4785033728
> > > diff=453263ms worklist_empty=0
> > >
> > > To avoid slowing down the hot path (queue_work, etc.), recheck
> > > last_progress_ts with pool->lock held. This will eliminate the false
> > > positive with minimal overhead.
> > >
> > > Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.
> > >
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -7699,8 +7699,28 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
> > > else
> > > ts = touched;
> > >
> > > - /* did we stall? */
> > > + /*
> > > + * Did we stall?
> > > + *
> > > + * Do a lockless check first. On weakly ordered
> > > + * architectures, the lockless check can observe a
> > > + * reordering between worklist insert_work() and
> > > + * last_progress_ts update from __queue_work().

This explains why taking the lock is needed.

> > > + Since
> > > + * __queue_work() is a much hotter path than the timer
> > > + * function, we handle false positive here by reading
> > > + * last_progress_ts again with pool->lock held.

But this is confusing. It says that __queue_work() is a much hotter path
but it already takes pool->lock. The sentence makes a feeling that
the watchdog patch is less hot. Then it is weird why the watchdog
path ignores the lock by default.

Instead, it would make sense to explain why taking the lock only
around "pool->last_progress_ts" read is safe.

You know, the code makes my head spin because it violates the usual
locking rules. The lock should serialize 2 or more writes/reads.
It works only when all the reads/writes are done under the lock.
But it is not the case here.

IMHO, the new code works because spin_lock() is a full memory
barrier. It makes sure that

pool->worklist

is read before

worker->pool->last_progress_ts

Plus, the worker->pool->last_progress_ts read has to wait for
any parallel modifications of both variables.

As a result, pool->worklist value might be wrong (set or empty)
but the timestamp is always more recent.

IMHO, it prevents false positives (reporting a stall because of
outdated timestamp). While it does not prevent false negatives
(missing a stall). But it is not a problem because the stall
will get caught in the next watchdog check round.

> > > + */
> > > if (time_after(now, ts + thresh)) {
> > > + scoped_guard(raw_spinlock_irqsave, &pool->lock) {
> > > + pool_ts = pool->last_progress_ts;
> > > + if (time_after(pool_ts, touched))
> > > + ts = pool_ts;
> > > + else
> > > + ts = touched;
> > > + }
> > > + if (!time_after(now, ts + thresh))
> > > + continue;
> >
> > The new code is pretty hairy. It might make sense to take the lock
> > around the original check and keep it as is.
> >
> > IMHO, if a contention on a pool->lock has ever been a problem than maybe
> > the non-trivial workqueue API was not a good choice for the affected
> > use-case. Or am I wrong?
>
> Given the false positive is really rare (a handful of them per day among a
> fleet of thousands of hosts), I think taking hundreds of pool->lock per 30
> second seems not necessary.

This optimization might make sense. And it most likely works. My
primary problem is that the code violates standard locking rules.
And the comment does not explain why it works.

I am sorry if I am too meticulous. I just know that it is not trivial
to make lockless algorithms right. And the new code is still kind
of lockless.

Best Regards,
Petr