Re: [PATCH v3 12/14] arm64: realm: Support nonsecure ITS emulation shared

From: Steven Price
Date: Fri Jun 07 2024 - 11:45:37 EST


On 06/06/2024 19:38, Catalin Marinas wrote:
> On Thu, Jun 06, 2024 at 11:17:36AM +0100, Marc Zyngier wrote:
>> On Wed, 05 Jun 2024 16:08:49 +0100,
>> Steven Price <steven.price@xxxxxxx> wrote:
>>> 2. Use a special (global) memory allocator that does the
>>> set_memory_decrypted() dance on the pages that it allocates but allows
>>> packing the allocations. I'm not aware of an existing kernel API for
>>> this, so it's potentially quite a bit of code. The benefit is that it
>>> reduces memory consumption in a realm guest, although fragmentation
>>> still means we're likely to see a (small) growth.
>>>
>>> Any thoughts on what you think would be best?
>>
>> I would expect that something similar to kmem_cache could be of help,
>> only with the ability to deal with variable object sizes (in this
>> case: minimum of 256 bytes, in increments defined by the
>> implementation, and with a 256 byte alignment).
>
> Hmm, that's doable but not that easy to make generic. We'd need a new
> class of kmalloc-* caches (e.g. kmalloc-decrypted-*) which use only
> decrypted pages together with a GFP_DECRYPTED flag or something to get
> the slab allocator to go for these pages and avoid merging with other
> caches. It would actually be the page allocator parsing this gfp flag,
> probably in post_alloc_hook() to set the page decrypted and re-encrypt
> it in free_pages_prepare(). A slight problem here is that free_pages()
> doesn't get the gfp flag, so we'd need to store some bit in the page
> flags. Maybe the flag is not that bad, do we have something like for
> page_to_phys() to give us the high IPA address for decrypted pages?
>
> Similarly if we go for a kmem_cache (or a few for multiple sizes). One
> can specify a constructor which could set the memory decrypted but
> there's no destructor (and also the constructor is per object, not per
> page, so we'd need some refcounting).
>
> Another approach contained within the driver is to use mempool_create()
> with our own _alloc_fn/_free_fn that sets the memory decrypted/encrypted
> accordingly, though sub-page allocations need additional tracking. Also
> that's fairly similar to kmem_cache, fixed size.
>
> Yet another option would be to wire it somehow in the DMA API but the
> minimum allocation is already a page size, so we don't gain anything.
>
> What gets somewhat closer to what we need is gen_pool. It can track
> different sizes, we just need to populate the chunks as needed. I don't
> think this would work as a generic allocator but may be good enough
> within the ITS code.
>
> If there's a need for such generic allocations in other parts of the
> kernel, my preference would be something around kmalloc caches and a new
> GFP flag (first option; subject to the selling it to the mm folk). But
> that's more of a separate prototyping effort that may or may not
> succeed.
>
> Anyway, we could do some hacking around gen_pool as a temporary solution
> (maybe as a set of patches on top of this series to be easier to revert)
> and start investigating a proper decrypted page allocator in parallel.
> We just need to find a victim that has the page allocator fresh in mind
> (Ryan or Alexandru ;)).

Thanks for the suggestions Catalin. I had a go at implementing something
with gen_pool - the below (very lightly tested) hack seems to work. This
is on top of the current series.

I *think* it should also be safe to drop the whole alignment part with
this custom allocator, which could actually save memory. But I haven't
quite got my head around that yet.

Steve

----8<-----
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index ca72f830f4cc..e78634d4d22c 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -12,6 +12,7 @@
#include <linux/crash_dump.h>
#include <linux/delay.h>
#include <linux/efi.h>
+#include <linux/genalloc.h>
#include <linux/interrupt.h>
#include <linux/iommu.h>
#include <linux/iopoll.h>
@@ -165,7 +166,7 @@ struct its_device {
struct its_node *its;
struct event_lpi_map event_map;
void *itt;
- u32 itt_order;
+ u32 itt_sz;
u32 nr_ites;
u32 device_id;
bool shared;
@@ -225,6 +226,50 @@ static void its_free_pages(void *addr, unsigned int order)
free_pages((unsigned long)addr, order);
}

+static struct gen_pool *itt_pool;
+
+static void *itt_alloc_pool(int node, int size)
+{
+ unsigned long addr;
+ struct page *page;
+
+ if (size >= PAGE_SIZE) {
+ page = its_alloc_pages_node(node,
+ GFP_KERNEL | __GFP_ZERO,
+ get_order(size));
+
+ return page_address(page);
+ }
+
+ do {
+ addr = gen_pool_alloc(itt_pool, size);
+ if (addr)
+ break;
+
+ page = its_alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 1);
+ if (!page)
+ break;
+
+ gen_pool_add(itt_pool, (unsigned long)page_address(page),
+ PAGE_SIZE, node);
+ } while (!addr);
+
+ return (void*)addr;
+}
+
+static void itt_free_pool(void *addr, int size)
+{
+ if (!addr)
+ return;
+
+ if (size >= PAGE_SIZE) {
+ its_free_pages(addr, get_order(size));
+ return;
+ }
+
+ gen_pool_free(itt_pool, (unsigned long)addr, size);
+}
+
/*
* Skip ITSs that have no vLPIs mapped, unless we're on GICv4.1, as we
* always have vSGIs mapped.
@@ -3450,9 +3495,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
unsigned long *lpi_map = NULL;
unsigned long flags;
u16 *col_map = NULL;
- struct page *page;
void *itt;
- int itt_order;
int lpi_base;
int nr_lpis;
int nr_ites;
@@ -3471,13 +3514,8 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
nr_ites = max(2, nvecs);
sz = nr_ites * (FIELD_GET(GITS_TYPER_ITT_ENTRY_SIZE, its->typer) + 1);
sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1;
- itt_order = get_order(sz);
- page = its_alloc_pages_node(its->numa_node,
- GFP_KERNEL | __GFP_ZERO,
- itt_order);
- if (!page)
- return NULL;
- itt = page_address(page);
+
+ itt = itt_alloc_pool(its->numa_node, sz);

dev = kzalloc(sizeof(*dev), GFP_KERNEL);

@@ -3492,9 +3530,9 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
lpi_base = 0;
}

- if (!dev || !col_map || (!lpi_map && alloc_lpis)) {
+ if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis)) {
kfree(dev);
- its_free_pages(itt, itt_order);
+ itt_free_pool(itt, sz);
bitmap_free(lpi_map);
kfree(col_map);
return NULL;
@@ -3504,7 +3542,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,

dev->its = its;
dev->itt = itt;
- dev->itt_order = itt_order;
+ dev->itt_sz = sz;
dev->nr_ites = nr_ites;
dev->event_map.lpi_map = lpi_map;
dev->event_map.col_map = col_map;
@@ -3532,7 +3570,7 @@ static void its_free_device(struct its_device *its_dev)
list_del(&its_dev->entry);
raw_spin_unlock_irqrestore(&its_dev->its->lock, flags);
kfree(its_dev->event_map.col_map);
- its_free_pages(its_dev->itt, its_dev->itt_order);
+ itt_free_pool(its_dev->itt, its_dev->itt_sz);
kfree(its_dev);
}

@@ -5722,6 +5760,10 @@ int __init its_init(struct fwnode_handle *handle, struct rdists *rdists,
bool has_v4_1 = false;
int err;

+ itt_pool = gen_pool_create(get_order(ITS_ITT_ALIGN), -1);
+ if (!itt_pool)
+ return -ENOMEM;
+
gic_rdists = rdists;

its_parent = parent_domain;