Re: [PATCH] mm/sparse: Fix flags overlap in section_mem_map

From: HORIGUCHI NAOYA(堀口 直也)
Date: Wed Jun 23 2021 - 19:09:49 EST


On Tue, Apr 27, 2021 at 11:05:17AM +0200, David Hildenbrand wrote:
> On 27.04.21 10:30, Wang Wensheng wrote:
> > The section_mem_map member of struct mem_section stores some flags and
> > the address of struct page array of the mem_section.
> >
> > Additionally the node id of the mem_section is stored during early boot,
> > where the struct page array has not been allocated. In other words, the
> > higher bits of section_mem_map are used for two purpose, and the node id
> > should be clear properly after the early boot.
> >
> > Currently the node id field is overlapped with the flag field and cannot
> > be clear properly. That overlapped bits would then be treated as
> > mem_section flags and may lead to unexpected side effects.
> >
> > Define SECTION_NID_SHIFT using order_base_2 to ensure that the node id
> > field always locates after flags field. That's why the overlap occurs -
> > forgetting to increase SECTION_NID_SHIFT when adding new mem_section
> > flag.
> >
> > Fixes: 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
> > Signed-off-by: Wang Wensheng <wangwensheng4@xxxxxxxxxx>
> > ---
> > include/linux/mmzone.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 47946ce..b01694d 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1325,7 +1325,7 @@ extern size_t mem_section_usage_size(void);
> > #define SECTION_TAINT_ZONE_DEVICE (1UL<<4)
> > #define SECTION_MAP_LAST_BIT (1UL<<5)
> > #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1))
> > -#define SECTION_NID_SHIFT 3
> > +#define SECTION_NID_SHIFT order_base_2(SECTION_MAP_LAST_BIT)
> > static inline struct page *__section_mem_map_addr(struct mem_section *section)
> > {
> >
>
> Well, all sections around during boot that have an early NID are early ...
> so it's not an issue with SECTION_IS_EARLY, no? I mean, it's ugly, but not
> broken.
>
> But it's an issue with SECTION_TAINT_ZONE_DEVICE, AFAIKT.
> sparse_init_one_section() would leave the bit set if the nid happens to have
> that bit set (e.g., node 2,3). It's semi-broken then, because we force all
> pfn_to_online_page() through the slow path.
>
>
> That whole section flag setting code is fragile.

Hi Wensheng, David,

This patch seems not accepted or updated yet, so what's going on?

We recently saw the exact issue on testing crash utilities with latest
kernels on 4 NUMA node system. SECTION_TAINT_ZONE_DEVICE seems to be
set wrongly, and makedumpfile could fail due to this. So we need a fix.

I thought of another approach like below before finding this thread,
so if you are fine, I'd like to submit a patch with this. And if you
like going with order_base_2() approach, I'm glad to ack this patch.

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1358,14 +1358,15 @@ extern size_t mem_section_usage_size(void);
* which results in PFN_SECTION_SHIFT equal 6.
* To sum it up, at least 6 bits are available.
*/
+#define SECTION_MAP_LAST_SHIFT 5
#define SECTION_MARKED_PRESENT (1UL<<0)
#define SECTION_HAS_MEM_MAP (1UL<<1)
#define SECTION_IS_ONLINE (1UL<<2)
#define SECTION_IS_EARLY (1UL<<3)
#define SECTION_TAINT_ZONE_DEVICE (1UL<<4)
-#define SECTION_MAP_LAST_BIT (1UL<<5)
+#define SECTION_MAP_LAST_BIT (1UL<<SECTION_MAP_LAST_SHIFT)
#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1))
-#define SECTION_NID_SHIFT 3
+#define SECTION_NID_SHIFT SECTION_MAP_LAST_SHIFT

static inline struct page *__section_mem_map_addr(struct mem_section *section)
{

Thanks,
Naoya Horiguchi