Re: [PATCH 6/6 v2] mm: Drain LRUs upon resume to userspace on nohz_full CPUs

From: Frederic Weisbecker
Date: Fri Apr 04 2025 - 09:15:10 EST


Le Tue, Feb 11, 2025 at 12:42:51PM +0100, Michal Hocko a écrit :
> On Sun 09-02-25 23:30:04, Frederic Weisbecker wrote:
> > LRUs can be drained through several ways. One of them may add disturbances
> > to isolated workloads while queuing a work at any time to any target,
> > whether running in nohz_full mode or not.
> >
> > Prevent from that on isolated tasks with defering LRUs drains upon
> > resuming to userspace using the isolated task work framework.
>
> I have to say this is rather cryptic description of the udnerlying
> problem. What do you think about the following:
>
> LRU batching can be source of disturbances for isolated workloads
> running in the userspace because it requires kernel worker to handle
> that and that would preempt the said task. The primary source for such
> disruption would be __lru_add_drain_all which could be triggered from
> non-isolated CPUs.
>
> Why would an isolated CPU have anything on the pcp cache? Many syscalls
> allocate pages that might end there. A typical and unavoidable one would
> be fork/exec leaving pages on the cache behind just waiting for somebody
> to drain.
>
> This patch addresses the problem by noting a patch has been added to the
> cache and schedule draining to the return path to the userspace so the
> work is done while the syscall is still executing and there are no
> suprises while the task runs in the userspace where it doesn't want to
> be preempted.

Much better indeed :-)

> > @@ -376,6 +377,8 @@ static void __lru_cache_activate_folio(struct folio *folio)
> > }
> >
> > local_unlock(&cpu_fbatches.lock);
> > +
> > + isolated_task_work_queue();
> > }
>
> This placement doens't make much sense to me. I would put
> isolated_task_work_queue when we queue something up. That would be
> folio_batch_add if folio_batch_space(fbatch) > 0.

Ok I moved it there but why doesn't it make sense also when
folio_batch_space(fbatch) == 0, doesn't it mean that folio_batch_add()
still added the entry but there is just no further room left?

>
> >
> > #ifdef CONFIG_LRU_GEN
> > @@ -738,7 +741,7 @@ void lru_add_drain(void)
> > * the same cpu. It shouldn't be a problem in !SMP case since
> > * the core is only one and the locks will disable preemption.
> > */
> > -static void lru_add_and_bh_lrus_drain(void)
> > +void lru_add_and_bh_lrus_drain(void)
> > {
> > local_lock(&cpu_fbatches.lock);
> > lru_add_drain_cpu(smp_processor_id());
> > @@ -769,6 +772,9 @@ static bool cpu_needs_drain(unsigned int cpu)
> > {
> > struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
> >
> > + if (!housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE))
> > + return false;
> > +
>
> Would it make more sense to use cpu_is_isolated() and use it explicitly
> in __lru_add_drain_all so that it is clearly visible - with a comment
> that isolated workloads are dealing with cache on their return to
> userspace.

No because cpu_is_isolated() is also true when the CPU is on NULL domain
(isolated cpusset partition, or isolcpus=domain). And not all workloads
want the possible overhead of scattered batching after every syscalls.

Thanks.

>
> > /* Check these in order of likelihood that they're not zero */
> > return folio_batch_count(&fbatches->lru_add) ||
> > folio_batch_count(&fbatches->lru_move_tail) ||
> > --
> > 2.46.0
>
> --
> Michal Hocko
> SUSE Labs