Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

From: Michal Hocko
Date: Thu Jun 22 2017 - 14:17:05 EST


[Again, please try to trim your quoted response to the minimum]

On Thu 22-06-17 10:32:43, Wei Yang wrote:
> On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
[...]
> >@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> > }
> >
> > /*
> >+ * Returns a default kernel memory zone for the given pfn range.
> >+ * If no kernel zone covers this pfn range it will automatically go
> >+ * to the ZONE_NORMAL.
> >+ */
> >+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
> >+ unsigned long nr_pages)
> >+{
> >+ struct pglist_data *pgdat = NODE_DATA(nid);
> >+ int zid;
> >+
> >+ for (zid = 0; zid <= ZONE_NORMAL; zid++) {
> >+ struct zone *zone = &pgdat->node_zones[zid];
> >+
> >+ if (zone_intersects(zone, start_pfn, nr_pages))
> >+ return zone;
> >+ }
> >+
> >+ return &pgdat->node_zones[ZONE_NORMAL];
> >+}
>
> Hmm... a corner case jumped into my mind which may invalidate this
> calculation.
>
> The case is:
>
>
> Zone: | DMA | DMA32 | NORMAL |
> v v v v
>
> Phy mem: [ ] [ ]
>
> ^ ^ ^ ^
> Node: | Node0 | | Node1 |
> A B C D
>
>
> The key point is
> 1. There is a hole between Node0 and Node1
> 2. The hole sits in a non-normal zone
>
> Let's mark the boundary as A, B, C, D. Then we would have
> node0->zone[dma21] = [A, B]
> node1->zone[dma32] = [C, D]
>
> If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
> if we want to hotplug a range in [B, C] on node1, it will introduce the
> overlapped zone. Because the range [B, C] intersects none of the existing
> zones on node1.
>
> Do you think this is possible?

Yes, it is possible. I would be much more more surprised if it was real
as well. Fixing that would require to use arch_zone_{lowest,highest}_possible_pfn
which is not available after init section disappears and I am not even
sure we should care. I would rather wait for a real life example of such
a configuration to fix it.
--
Michal Hocko
SUSE Labs