Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region

From: David Hildenbrand
Date: Wed Mar 31 2021 - 02:41:53 EST


On 31.03.21 08:19, Alistair Popple wrote:
On Tuesday, 30 March 2021 8:13:32 PM AEDT David Hildenbrand wrote:
External email: Use caution opening links or attachments


On 29.03.21 03:37, Alistair Popple wrote:
On Friday, 26 March 2021 7:57:51 PM AEDT David Hildenbrand wrote:
On 26.03.21 02:20, Alistair Popple wrote:
request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free.

Just a high-level question: how does this iteract with memory
hot(un)plug? IOW, how defines and manages the "range of possible
addresses" ?

Both the driver and the maximum physical address bits available define the
range of possible addresses for device private memory. From
__request_free_mem_region():

end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
addr = end - size + 1UL;

There is no lower address range bound here so it is effectively zero. The
code
will try to allocate the highest possible physical address first and
continue
searching down for a free block. Does that answer your question?

Oh, sorry, the fist time I had a look I got it wrong - I thought (1UL <<
MAX_PHYSMEM_BITS) would be the lower address limit. That looks indeed
problematic to me.

You might end up reserving an iomem region that could be used e.g., by
memory hotplug code later. If someone plugs a DIMM or adds memory via
different approaches (virtio-mem), memory hotplug (via add_memory())
would fail.

You never should be touching physical memory area reserved for memory
hotplug, i.e., via SRAT.

What is the expectation here?

Most drivers call request_free_mem_region() with iomem_resource as the base.
So zone device private pages currently tend to get allocated from the top of
that.

Okay, but you could still "steal" iomem space that does not belong to you, and the firmware will be unaware of that (e.g., it might hotplug a DIMM in these spots). This is really nasty (although I guess as you allocate top down, it will happen rarely).


By definition ZONE_DEVICE private pages are unaddressable from the CPU. So in
terms of expectation I think all that is really required for ZONE_DEVICE
private pages (at least for Nouveau) is a valid range of physical addresses
that allow page_to_pfn() and pfn_to_page() to work correctly. To make this
work drivers add the pages via memremap_pages() -> pagemap_range() ->
add_pages().

So you'd actually want some region above the hotpluggable/addressable range -- e.g., above MAX_PHYSMEM_BITS.

The maximum number of sections we can have is define by

#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)

You'd e.g., want an extra space like (to be improved)

#define DEVMEM_BITS 1
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS + DEVMEM_BITS - SECTION_SIZE_BITS)

And do the search only within that range.

--
Thanks,

David / dhildenb