On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote:
On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:[...]
On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:
On 24.08.23 13:06, David Hildenbrand wrote:
Regarding one complication: "The kernel needs to know where to allocate
a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
(mprotect()) and the range it is in does not support tagging.",
simplified handling would be if it's in a MIGRATE_CMA pageblock, it
doesn't support tagging. You have to migrate to a !CMA page (for
example, not specifying GFP_MOVABLE as a quick way to achieve that).
Okay, I now realize that this patch set effectively duplicates some CMA
behavior using a new migrate-type.
I considered mixing the tag storage memory memory with normal memory and
adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged,
this means that it's not enough anymore to have a __GFP_MOVABLE allocation
request to use MIGRATE_CMA.
I considered two solutions to this problem:
1. Only allocate from MIGRATE_CMA is the requested memory is not tagged =>
this effectively means transforming all memory from MIGRATE_CMA into the
MIGRATE_METADATA migratetype that the series introduces. Not very
appealing, because that means treating normal memory that is also on the
MIGRATE_CMA lists as tagged memory.
That's indeed not ideal. We could try this if it makes the patches
significantly simpler, though I'm not so sure.
Allocating metadata is the easier part as we know the correspondence
from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag
storage page), so alloc_contig_range() does this for us. Just adding it
to the CMA range is sufficient.
However, making sure that we don't allocate PROT_MTE pages from the
metadata range is what led us to another migrate type. I guess we could
achieve something similar with a new zone or a CPU-less NUMA node,
though the latter is not guaranteed not to allocate memory from the
range, only make it less likely. Both these options are less flexible in
terms of size/alignment/placement.
Maybe as a quick hack - only allow PROT_MTE from ZONE_NORMAL and
configure the metadata range in ZONE_MOVABLE but at some point I'd
expect some CXL-attached memory to support MTE with additional carveout
reserved.