Re: [PATCH 1/2] mm,memory_hotplug: Remove un-taken lock

From: David Hildenbrand
Date: Mon May 27 2024 - 03:54:12 EST


Am 24.05.24 um 14:02 schrieb Brendan Jackman:
On Wed, May 22, 2024 at 05:24:17PM +0200, David Hildenbrand wrote:
On 22.05.24 16:27, Brendan Jackman wrote:
On Wed, May 22, 2024 at 04:09:41PM +0200, David Hildenbrand wrote:

By the way, some noob questions: am I OK with my assumption that it's
fine for reader code to operate on zone spans that are both stale and
"from the future"? thinking abstractly I guess that seeing a stale
value when racing with offline_pages is roughly the same as seeing a
value "from the future" when racing with online_pages?

Right. PFN walkers should be using pfn_to_online_page(), where races are
possible but barely seen in practice.

zone handlers like mm/compaction.c can likely deal with races, although it
might all be cleaner (and safer?) when using start+end. I recall it also
recalls on pfn_to_online_page().

Regarding page_outside_zone_boundaries(), it should be fine if we can read
start+end atomically, that way we would not accidentally report "page
outside ..." when changing the start address. I think with your current
patch that might happen (although likely extremely hard to trigger) when
growing the zone at the start, reducing zone_start_pfn.

Thanks a lot, this is very helpful

Also, is it ever possible for pages to get removed and then added back
and end up in a different zone than before?

Yes. Changing between MOVABLE and NORMAL is possible and can easily be
triggered by offlining+re-onlining memory blocks.

So, even if we make it impossible to see a totally bogus zone span,
you can observe a stale/futuristic span which currently contains pages
from a different zone?

Yes. Note that zones/nodes can easily overlap, so a page being spanned by another zones is common and supported already.


That seems to imply you could look up a page page from a PFN within
zone A's apparent span, lock zone A and assume you can safely modify
the freelist the page is on, but actually that page is now in zone B.

That's why we obtain the zone/node always from the page itself (stored in page flags). This data can only change when offlining+reonlining memory (and pfn_to_online_page() would refuse to hand out the page while temporarily online).

There were discussions around using RCU to improve pfn_to_online_page() racing with memory offlining, but the motivation to do that has been rather small: we barely see such races in practice. Memory offlining+re-onlining simply takes too long.


So for example:

1. compact_zone() sets cc->free_pfn based on zone_end_pfn
2. isolate_freepages() sets isolate_start_pfn = cc->free_pfn
3. isolate_freepages_block() looks up a page based on that PFN
3. ... then takes the cc->zone lock
4. ... then calls __isolate_free_page which removes the page from
whatever freelist it's on.

Is anything stopping part 4 from modifying a zone that wasn't locked
in part 3?

Likely that overlapping zones already exist and are handled accordingly.

--
Thanks,

David / dhildenb