Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values

From: David Hildenbrand
Date: Tue Jun 08 2021 - 11:00:23 EST


On 07.06.21 12:23, Oscar Salvador wrote:
On Mon, Jun 07, 2021 at 10:49:01AM +0200, David Hildenbrand wrote:
I'd like to point out that I think the seqlock is not in place to
synchronize with actual growing/shrinking but to get consistent zone ranges
-- like using atomics, but we have two inter-dependent values here.

I guess so, at least that's what it should do.
But the way it is placed right now is misleading.

If we really want to get consistent zone ranges, we should start using
zone's seqlock where it matters and that is pretty much all those
places that use zone_spans_pfn().

Right, or even only zone_end_pfn() to get a consistent value.

Otherwise there is no way you can be sure the pfn you're checking is
within the limits. Moreover, as Michal pointed out early, if we really
want to go down that road the locking should be made in the caller
evolving the operation, otheriwse things might change once the lock
is dropped and you're working with a wrong assumption.

I can see arguments for both riping it out and doing it right (but none for
the way it is right now).
For riping it out, one could say that those races might not be fatal,
as usually the pfn you're working with (the one you want to check falls
within a certain range) you know is valid, so the worst can happen is
you get false positives/negatives and that might or might not be detected
further down. How bad are false positive/negatives I guess it depends on the
situation, but we already do that right now.
The zone_spans_pfn() from page_outside_zone_boundaries() is the only one using
locking right now, so well, if we survided this long without locks in other places
using zone_spans_pfn() makes one wonder if it is that bad.

On the other hand, one could argue that for correctness sake, we should be holding
zone's seqlock whenever checking for zone_spans_pfn() to avoid any inconsistency.



IMHO, as we know the race exists and we have a tool to handle it in place, we should maybe fix the obvious cases if possible.

Code that uses zone->zone_start_pfn directly is unlikely to be broken on most architectures. We will usually read/write via single instruction and won't get inconsistencies, for example, when shrinking or growing the zone. We most probably don't want to use an atomic for that right now.

Code that uses zone->spanned_pages to detect the zone end, however, is more likely to be broken. I don't think we have any relevant around anymore. Everything was converted to zone_end_pfn().

I feel like we should just make zone_end_pfn() take the seqlock in read. Then, we at least get a consistent value, for example, while growing a zone.

Just imagine the following case when we grow a section to the front when onlining memory:

zone->zone_start_pfn -= new_pages;
zone->spanned_pages += new_pages;

Note that compilers/CPUs might reshuffle as they like. If someone (e.g., zone_spans_pfn()) races with that code, it might get new zone->zone_start_pfn but old zone->spanned_pages. zone_end_pfn() will report a "too small zone" and trigger false negatives in zone_spans_pfn().

--
Thanks,

David / dhildenb