Re: [PATCH v1] drivers/base/memory.c: Don't access uninitialized memmaps in soft_offline_page_store()

From: David Hildenbrand
Date: Fri Oct 11 2019 - 05:51:51 EST


On 11.10.19 08:13, Naoya Horiguchi wrote:
On Thu, Oct 10, 2019 at 04:12:00PM +0200, David Hildenbrand wrote:
Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.

Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:
:/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
[ 23.097167] soft offline: 0x150000 page already poisoned

But the actual result depends on the garbage in the memmap.

soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE. Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.

Add a check against pfn_to_online_page() and similarly return -EIO.

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
drivers/base/memory.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 6bea4f3f8040..55907c27075b 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -540,6 +540,9 @@ static ssize_t soft_offline_page_store(struct device *dev,
pfn >>= PAGE_SHIFT;
if (!pfn_valid(pfn))
return -ENXIO;
+ /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
+ if (!pfn_to_online_page(pfn))
+ return -EIO;

Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>

I think this check could be placed in soft_offline_page(), but that requires
a few more unrelated lines of changes due to the mismatch on type of parameter
between memory_failure() and soft_offline_page(), This is not your problem,
and I plan to do some cleanup on related interfaces, so this patch is fine.


Thanks,

well I think when you come via madvise(), you are always guaranteed to hold a reasonable page in your hands. Only when converting from arbitrary pfns, we have to watch out. But yeah, feel free to cc me on cleanups :)

- Naoya Horiguchi



--

Thanks,

David / dhildenb