Re: [RFC PATCH 0/6] Drain remote per-cpu directly v2

From: Minchan Kim
Date: Mon May 09 2022 - 11:58:59 EST


On Mon, May 09, 2022 at 02:07:59PM +0100, Mel Gorman wrote:
> Changelog since v1
> o Fix unsafe RT locking scheme
> o Use spin_trylock on UP PREEMPT_RT

Mel,


Is this only change from previous last version which has some
delta you fixed based on Vlastimil and me?

And is it still RFC?

Do you have some benchmark data?

I'd like to give Acked-by/Tested-by(even though there are a few
more places to align with new fields name in 1/6) since I have
tested these patchset in my workload and didn't spot any other
problems.

I really support this patch to be merged since the pcp draining
using workqueue is really harmful in the reclaim/cma path of
Android workload which has a variety of process priority control
and be stuck on the pcp draining.

>
> This series has the same intent as Nicolas' series "mm/page_alloc: Remote
> per-cpu lists drain support" -- avoid interference of a high priority
> task due to a workqueue item draining per-cpu page lists. While many
> workloads can tolerate a brief interruption, it may be cause a real-time
> task runnning on a NOHZ_FULL CPU to miss a deadline and at minimum,
> the draining in non-deterministic.
>
> Currently an IRQ-safe local_lock protects the page allocator per-cpu lists.
> The local_lock on its own prevents migration and the IRQ disabling protects
> from corruption due to an interrupt arriving while a page allocation is
> in progress. The locking is inherently unsafe for remote access unless
> the CPU is hot-removed.
>
> This series adjusts the locking. A spinlock is added to struct
> per_cpu_pages to protect the list contents while local_lock_irq continues
> to prevent migration and IRQ reentry. This allows a remote CPU to safely
> drain a remote per-cpu list.
>
> This series is a partial series. Follow-on work should allow the
> local_irq_save to be converted to a local_irq to avoid IRQs being
> disabled/enabled in most cases. Consequently, there are some TODO comments
> highlighting the places that would change if local_irq was used. However,
> there are enough corner cases that it deserves a series on its own
> separated by one kernel release and the priority right now is to avoid
> interference of high priority tasks.
>
> Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
> and when it is storing per-cpu pages.
>
> Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
> this is not necessary but it avoids per_cpu_pages consuming another
> cache line.
>
> Patch 3 is a preparation patch to avoid code duplication.
>
> Patch 4 is a simple micro-optimisation that improves code flow necessary for
> a later patch to avoid code duplication.
>
> Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
> relying on local_lock to prevent migration, stabilise the pcp
> lookup and prevent IRQ reentrancy.
>
> Patch 6 remote drains per-cpu pages directly instead of using a workqueue.
>
> include/linux/mm_types.h | 5 +
> include/linux/mmzone.h | 12 +-
> mm/page_alloc.c | 342 +++++++++++++++++++++++++--------------
> 3 files changed, 230 insertions(+), 129 deletions(-)
>
> --
> 2.34.1
>
>