Re: [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock
From: Mel Gorman
Date: Wed Mar 31 2021 - 07:02:47 EST
On Wed, Mar 31, 2021 at 11:55:56AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 29 2021 at 13:06, Mel Gorman wrote:
> > There is a lack of clarity of what exactly local_irq_save/local_irq_restore
> > protects in page_alloc.c . It conflates the protection of per-cpu page
> > allocation structures with per-cpu vmstat deltas.
> >
> > This patch protects the PCP structure using local_lock which
> > for most configurations is identical to IRQ enabling/disabling.
> > The scope of the lock is still wider than it should be but this is
> > decreased in later patches. The per-cpu vmstat deltas are protected by
> > preempt_disable/preempt_enable where necessary instead of relying on
> > IRQ disable/enable.
>
> Yes, this goes into the right direction and I really appreciate the
> scoped protection for clarity sake.
>
Thanks.
> > #ifdef CONFIG_MEMORY_HOTREMOVE
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 8a8f1a26b231..01b74ff73549 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -887,6 +887,7 @@ void cpu_vm_stats_fold(int cpu)
> >
> > pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
> >
> > + preempt_disable();
>
> What's the reason for the preempt_disable() here? A comment would be
> appreciated.
>
Very good question because it's protecting vm_stat_diff and
vm_numa_stat_diff in different contexts and not quite correctly at this
point of the series. By the end of the series vm_numa_stat_diff is a
simple counter and does not need special protection.
Right now, it's protecting against a read and clear of vm_stat_diff
in two contexts -- cpu_vm_stats_fold and drain_zonestats but it's only
defensive. cpu_vm_stats_fold is only called when a CPU is going dead and
drain_zonestats is called from memory hotplug context. The protection is
necessary only if a new drain_zonestats caller was added without taking
the RMW of vm_stat_diff into account which may never happen.
This whole problem with preemption could be avoided altogether if
this_cpu_xchg was used similar to what is done elsewhere in vmstat
so.... this?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 64429ca4957f..9528304ce24d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8969,8 +8969,9 @@ void zone_pcp_reset(struct zone *zone)
struct per_cpu_zonestat *pzstats;
/*
- * No race with drain_pages. drain_zonestat disables preemption
- * and drain_pages relies on the pcp local_lock.
+ * No race with drain_pages. drain_zonestat is only concerned with
+ * vm_*_stat_diff which is updated with this_cpu_xchg and drain_pages
+ * only cares about the PCP lists protected by local_lock.
*/
if (zone->per_cpu_pageset != &boot_pageset) {
for_each_online_cpu(cpu) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 01b74ff73549..34ff61a145d2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -887,13 +887,11 @@ void cpu_vm_stats_fold(int cpu)
pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
- preempt_disable();
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
if (pzstats->vm_stat_diff[i]) {
int v;
- v = pzstats->vm_stat_diff[i];
- pzstats->vm_stat_diff[i] = 0;
+ v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
atomic_long_add(v, &zone->vm_stat[i]);
global_zone_diff[i] += v;
}
@@ -903,13 +901,11 @@ void cpu_vm_stats_fold(int cpu)
if (pzstats->vm_numa_stat_diff[i]) {
int v;
- v = pzstats->vm_numa_stat_diff[i];
- pzstats->vm_numa_stat_diff[i] = 0;
+ v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0);
atomic_long_add(v, &zone->vm_numa_stat[i]);
global_numa_diff[i] += v;
}
#endif
- preempt_enable();
}
for_each_online_pgdat(pgdat) {
@@ -943,10 +939,9 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats)
{
int i;
- preempt_disable();
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
if (pzstats->vm_stat_diff[i]) {
- int v = pzstats->vm_stat_diff[i];
+ int v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
pzstats->vm_stat_diff[i] = 0;
atomic_long_add(v, &zone->vm_stat[i]);
atomic_long_add(v, &vm_zone_stat[i]);
@@ -955,14 +950,12 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats)
#ifdef CONFIG_NUMA
for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++)
if (pzstats->vm_numa_stat_diff[i]) {
- int v = pzstats->vm_numa_stat_diff[i];
+ int v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0);
- pzstats->vm_numa_stat_diff[i] = 0;
atomic_long_add(v, &zone->vm_numa_stat[i]);
atomic_long_add(v, &vm_numa_stat[i]);
}
#endif
- preempt_enable();
}
#endif
--
Mel Gorman
SUSE Labs