Re: [PATCH v2 1/3] mm,page_alloc: Use {get,put}_online_mems() to get stable zone's values
From: Oscar Salvador
Date: Thu Jun 03 2021 - 04:38:30 EST
On Wed, Jun 02, 2021 at 09:45:58PM +0200, Oscar Salvador wrote:
> It was too nice and easy to be true I guess.
> Yeah, I missed the point that blocking might be an issue here, and hotplug
> operations can take really long, so not an option.
> I must have switched my brain off back there, because now it is just too
> obvious.
>
> Then I guwmess that seqlock must stay and the only thing than can go is the
> pgdat resize lock from the hotplug code.
So, I have been looking into this again.
Of course, the approach taken here is outrageously wrong, but there are
some other things that are a bit confusing.
As pointed out, bad_range() (the function that ends up calling
page_outside_zone_boundaries) is called from different functions via VM_BUG_ON_*.
page_outside_zone_boundaries() takes care of taking the seqlock to avoid
reading stale values that might happen if we race with memhotplug
operations.
page_outside_zone_boundaries() calls zone_spans_pfn() to check that.
Now on the funny thing.
We do have several places happily calling zone_spans_pfn() without
holding zone's seqlock, e.g:
set_pageblock_migratetype
set_pfnblock_flags_mask
zone_spans_pfn
move_freepages_block
zone_spans_pfn
alloc_contig_pages
zone_spans_last_pfn
zone_spans_pfn
Those places hold zone->lock, while move_pfn_range_to_zone() and
remove_pfn_range_from_zone() hold zone->seqlock, so AFAICS, those places
could read a stale value and proceed thinking the range is within the
zone while it is not.
So I guess my question is, should we force those places to take the
seqlock reader as we do in page_outside_zone_boundaries(), (or maybe
just move the seqlock handling to zone_spans_pfn())?
Because I does not make much sense to take it in a VM_DEBUG context and
not in "real life".
Thoughts?
--
Oscar Salvador
SUSE L3