Re: [PATCH v5 00/27] Memory Folios
From: Matthew Wilcox
Date: Mon Mar 29 2021 - 13:57:27 EST
On Mon, Mar 29, 2021 at 05:58:32PM +0100, Matthew Wilcox wrote:
> In broad strokes, I think that having a Power Of Two Allocator
> with Descriptor (POTAD) is a useful foundational allocator to have.
> The specific allocator that we call the buddy allocator is very clever for
> the 1990s, but touches too many cachelines to be good with today's CPUs.
> The generalisation of the buddy allocator to the POTAD lets us allocate
> smaller quantities (eg a 512 byte block) and allocate descriptors which
> differ in size from a struct page. For an extreme example, see xfs_buf
> which is 360 bytes and is the descriptor for an allocation between 512
> and 65536 bytes.
>
> There are times when we need to get from the physical address to
> the descriptor, eg memory-failure.c or get_user_pages(). This is the
> equivalent of phys_to_page(), and it's going to have to be a lookup tree.
> I think this is a role for the Maple Tree, but it's not ready yet.
> I don't know if it'll be fast enough for this case. There's also the
> need (particularly for memory-failure) to determine exactly what kind
> of descriptor we're dealing with, and also its size. Even its owner,
> so we can notify them of memory failure.
A couple of things I forgot to mention ...
I'd like the POTAD to be not necessarily tied to allocating memory.
For example, I think it could be used to allocate swap space. eg the swap
code could register the space in a swap file as allocatable through the
POTAD, and then later ask the POTAD to allocate a POT from the swap space.
The POTAD wouldn't need to be limited to MAX_ORDER. It should be
perfectly capable of allocating 1TB if your machine has 1.5TB of RAM
in it (... and things haven't got too fragmented)
I think the POTAD can be used to replace the CMA. The CMA supports
weirdo things like "Allocate 8MB of memory at a 1MB alignment", and I
think that's doable within the data structures that I'm thinking about
for the POTAD. It'd first try to allocate an 8MB chunk at 8MB alignment,
and then if that's not possible, try to allocate two adjacent 4MB chunks;
continuing down until it finds that there aren't 8x1MB chunks, at which
point it can give up.