Re: [PATCH misc 1/2] workqueue: Add check for clocks going backwards to wq_worker_tick()

From: Paul E. McKenney
Date: Fri Aug 02 2024 - 12:39:43 EST


On Fri, Aug 02, 2024 at 12:06:06PM -0400, Rik van Riel wrote:
> On Thu, 2024-08-01 at 17:30 -0700, Paul E. McKenney wrote:
> >
> > +++ b/kernel/workqueue.c
> > @@ -1482,6 +1482,7 @@ void wq_worker_tick(struct task_struct *task)
> >   * If the current worker is concurrency managed and hogged
> > the CPU for
> >   * longer than wq_cpu_intensive_thresh_us, it's
> > automatically marked
> >   * CPU_INTENSIVE to avoid stalling other concurrency-managed
> > work items.
> > + * If the time is negative, ignore, assuming a backwards
> > clock.
> >   *
> >   * Set @worker->sleeping means that @worker is in the
> > process of
> >   * switching out voluntarily and won't be contributing to
> > @@ -1491,6 +1492,7 @@ void wq_worker_tick(struct task_struct *task)
> >   * We probably want to make this prettier in the future.
> >   */
> >   if ((worker->flags & WORKER_NOT_RUNNING) ||
> > READ_ONCE(worker->sleeping) ||
> > +     WARN_ON_ONCE((s64)(worker->task->se.sum_exec_runtime -
> > worker->current_at) < 0) ||
> >       worker->task->se.sum_exec_runtime - worker->current_at <
> >       wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
> >   return;
>
> What is the code path by which sum_exec_runtime could go backward
> in time, if the TSC and sched_clock() jump backward?
>
> Might it make sense to check in the place where sum_exec_runtime is
> updated, instead, and catch a wider net?
>
> On the flip side, the run time increments are "fairly large" in
> number of TSC cycles, while most of the negative TSC jumps we 
> have seen are quite small, so even that wider net might not catch
> much because of how coarse these updates typically are...

Good points! Even more telling, this patch didn't catch anything during
Breno's tests. I will drop it with a workqueue.2024.08.01 branch in
-rcu in case someone needs it later.

Thanx, Paul

> All Rights Reversed.