Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region

From: David Hildenbrand
Date: Mon Mar 29 2021 - 05:28:36 EST


On 29.03.21 03:37, Alistair Popple wrote:
On Friday, 26 March 2021 7:57:51 PM AEDT David Hildenbrand wrote:
On 26.03.21 02:20, Alistair Popple wrote:
request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free.

Just a high-level question: how does this iteract with memory
hot(un)plug? IOW, how defines and manages the "range of possible
addresses" ?

Both the driver and the maximum physical address bits available define the
range of possible addresses for device private memory. From
__request_free_mem_region():

end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
addr = end - size + 1UL;

There is no lower address range bound here so it is effectively zero. The code
will try to allocate the highest possible physical address first and continue
searching down for a free block. Does that answer your question?

Ah, yes, thanks - that makes sense.



region_intersects() obtains a read lock before walking the resource tree
to protect against concurrent changes. However it drops the lock prior
to returning. This means by the time request_mem_region() is called in
request_free_mem_region() another thread may have already reserved the
requested region resulting in unexpected failures and a message in the
kernel log from hitting this condition:

I am confused. Why can't we return an error to the caller and let the
caller continue searching? This feels much simpler than what you propose
here. What am I missing?

The search occurs as part of the allocation. To allocate memory free space
needs to be located and allocated as a single operation. However in this case
the lock is dropped between locating a free region and allocating it resulting
in an extra debug check firing and subsequent failure.

I did originally consider just allowing the caller to retry, but in the end it
didn't seem any simpler. Callers would have to differentiate between transient
and permanent failures and figure out how often to retry and no doubt each
caller would do this differently. There is also the issue of starvation if one

Right, you would want to return -EBUSY, -ENOMEM,... from __request_region() - which somehow seems like the right thing to do considering that we can have both types of errors already.

thread constantly looses the race to allocate after the search. Overall it
seems simpler to me to just have a call that allocates a region (or fails due
to lack of free space).

Fair enough, but I doubt the starvation is a real issue ...


I also don't think what I am proposing is particularly complex. I agree the

Well, it adds another 42 LOC to kernel/resource.c for a rather special case that just needs a better return value from __request_region() to make a decision.

diff makes it look complex, but at a high level all I'm doing is moving the
locking to outer function calls. It ends up looking more complex because there
are some memory allocations which need reordering, but I don't think if things
were originally written this way it would be considered complex.

- Alistair

--
Thanks,

David / dhildenb







--
Thanks,

David / dhildenb