Re: [RFC PATCH] xprtrdma: Move long delayed work on system_dfl_long_wq

From: Frederic Weisbecker

Date: Thu Apr 30 2026 - 11:04:26 EST


Le Thu, Apr 30, 2026 at 10:05:52AM -0400, Chuck Lever a écrit :
>
>
> On Thu, Apr 30, 2026, at 10:01 AM, Frederic Weisbecker wrote:
> > Le Thu, Apr 30, 2026 at 09:35:20AM -0400, Chuck Lever a écrit :
> >>
> >> On Thu, Apr 30, 2026, at 4:54 AM, Marco Crivellari wrote:
> >> > Currently the code enqueue work items using {queue|mod}_delayed_work(),
> >> > using system_long_wq. This workqueue should be used when long works are
> >> > expected, but it is a per-cpu workqueue.
> >> >
> >> > This is important because queue_delayed_work() queue the work using:
> >> >
> >> > queue_delayed_work_on(WORK_CPU_UNBOUND, ...);
> >> >
> >> > Note that WORK_CPU_UNBOUND = NR_CPUS.
> >> >
> >> > This would end up calling __queue_delayed_work() that does:
> >> >
> >> > if (housekeeping_enabled(HK_TYPE_TIMER)) {
> >> > // [....]
> >> > } else {
> >> > if (likely(cpu == WORK_CPU_UNBOUND))
> >> > add_timer_global(timer);
> >> > else
> >> > add_timer_on(timer, cpu);
> >> > }
> >> >
> >> > So when cpu == WORK_CPU_UNBOUND the timer is global and is
> >> > not using a specific CPU. Later, when __queue_work() is called:
> >> >
> >> > if (req_cpu == WORK_CPU_UNBOUND) {
> >> > if (wq->flags & WQ_UNBOUND)
> >> > cpu = wq_select_unbound_cpu(raw_smp_processor_id());
> >> > else
> >> > cpu = raw_smp_processor_id();
> >> > }
> >> >
> >> > Because the wq is not unbound, it takes the CPU where the timer
> >> > fired and enqueue the work on that CPU.
> >> > The consequence of all of this is that the work can run anywhere,
> >> > depending on where the timer fired.
> >> >
> >> > Recently, a new unbound workqueue specific for long running work has
> >> > been added:
> >> >
> >> > c116737e972e ("workqueue: Add system_dfl_long_wq for long unbound works")
> >> >
> >> > So change system_long_wq with system_dfl_long_wq so that the work may
> >> > benefit from scheduler task placement.
> >>
> >> The patch description confuses me.
> >>
> >> The message ends with "the work can run anywhere, depending on where
> >> the timer fired." Read literally, "can run anywhere" sounds like a
> >> feature, not a bug
> >
> > A feature, but incomplete :)
> >
> >> — and the proposed fix (WQ_UNBOUND) also lets it
> >> run anywhere, just via a different selection path. Without a sentence
> >> saying "and that anywhere includes isolated CPUs, which we don't want,"
> >> the reader is left to fill in the gap.
> >
> > Not quite, global timers don't fire on isolated CPUs. And since it gets enqueued
> > on the CPU where it fired, it won't be enqueued on an isolated CPU.
> >
> >>
> >> So, could the commit message lead with the motivation? My guess is that
> >> this is about respecting HK_TYPE_TIMER housekeeping on isolated systems,
> >> which system_long_wq cannot do because its per-CPU pool ignores the
> >> housekeeping mask once the global timer fires. If that is the case,
> >> please say so directly and the mechanism trace becomes a supporting
> >> argument rather than the whole argument.
> >
> > The purpose is explained on the last line:
> >
> > """
> > So change system_long_wq with system_dfl_long_wq so that the work may
> > benefit from scheduler task placement.
> > """
> >
> > Arguably this could be elaborated. For example we can change that:
> >
> > """
> > The consequence of all of this is that the work can run anywhere,
> > depending on where the timer fired.
> > """
> >
> > into that:
> >
> > """
> > The consequence of all of this is that the work can run on any
> > housekeeping CPU, irrespective of the scheduler that knows better
> > about the best task placement, which would apply if the work were
> > to be queued on an unbound workqueue.
> > """
> >
> > Would that help?
>
> It's still not clearing it up for me.
>
> Does the patch address a bug (work isn't getting rescheduled at
> all) or is it merely a minor optimization for certain platforms?
>
> What's the user-visible issue that will be improved with this
> change?

It's not a bug, it's an optimization power-wise and performance-wise
and also part of a bigger sanity change:

- Long works have no reason to stick to a single CPU. If they are converted to
be unbound, the scheduler can move them to relevant targets to optimize
performances and power consumption. Hence the new system_unbound_long_wq.
The goal is to remove system_long_wq if none of its users rely on locality.

- Using queue_delayed_work() with a bound workqueue doesn't make any sense
since the target is completely random.

Thanks.

--
Frederic Weisbecker
SUSE Labs