RE: [PATCH] nitro_enclaves: merge contiguous physical memory regions
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Date: Wed Sep 15 2021 - 03:28:54 EST
> -----Original Message-----
> From: Paraschiv, Andra-Irina [mailto:andraprs@xxxxxxxxxx]
> Sent: Wednesday, September 15, 2021 1:45 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>; Greg
> KH <gregkh@xxxxxxxxxxxxxxxxxxx>; Kamal Mostafa <kamal@xxxxxxxxxxxxx>;
> Alexandru Vasile <lexnv@xxxxxxxxxx>; Alexandru Ciobotaru
> <alcioa@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>; Stefano Garzarella
> <sgarzare@xxxxxxxxxx>; Stefan Hajnoczi <stefanha@xxxxxxxxxx>; Vitaly
> Kuznetsov <vkuznets@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx;
> ne-devel-upstream@xxxxxxxxxx
> Subject: Re: [PATCH] nitro_enclaves: merge contiguous physical memory regions
>
>
>
> On 14/09/2021 06:14, Longpeng(Mike) wrote:
> >
>
> Thanks for the patch.
>
> > There maybe too many physical memory regions if the memory regions
> > backend with 2M hugetlb pages.
>
> Could update something along these lines:
>
> There can be cases when there are more memory regions that need to be
> set for an enclave than the maximum supported number of memory regions
> per enclave. One example can be when the memory regions are backed by 2
> MiB hugepages (the minimum supported hugepage size).
>
> >
> > Let's merge the adjacent regions if they are physical contiguous.
>
> Let's merge the adjacent regions if they are physical contiguous. This
> way the final number of memory regions is less than before merging and
> could potentially avoid reaching maximum.
>
Okay, it looks much better, thanks.
> >
> > Signed-off-by: Longpeng(Mike) <longpeng2@xxxxxxxxxx>
> > ---
> > drivers/virt/nitro_enclaves/ne_misc_dev.c | 64 +++++++++++++++++--------------
> > 1 file changed, 35 insertions(+), 29 deletions(-)
>
> Let's see the current state of the NE user space tooling and how it
> interacts with the NE kernel driver.
>
> The Nitro CLI tooling sets one hugepage per memory region [1]. The NE
> sample from the upstream Linux kernel does the same [2].
>
> We need to have a way to exercise this code path of merging memory
> regions and validate it. Let's have, for example, an update for the NE
> sample, if you can please add as an additional commit in the next revision.
>
> There could be an option e.g. to have both 2 MiB and 8 MiB memory
> regions, backed by 2 MiB hugepages. For example:
>
>
> /**
> - * NE_MIN_MEM_REGION_SIZE - Minimum size of a memory region - 2 MiB.
> + * NE_2MB_MEM_REGION_SIZE - Size of a memory region - 2 MiB.
> */
> -#define NE_MIN_MEM_REGION_SIZE (2 * 1024 * 1024)
> +#define NE_2MB_MEM_REGION_SIZE (2 * 1024 * 1024)
>
> /**
> - * NE_DEFAULT_NR_MEM_REGIONS - Default number of memory regions of 2
> MiB set for
> + * NE_8MB_MEM_REGION_SIZE - Size of a memory region - 8 MiB.
> + */
> +#define NE_8MB_MEM_REGION_SIZE (8 * 1024 * 1024)
> +
> +/**
> + * NE_DEFAULT_NR_2MB_MEM_REGIONS - Default number of memory regions of
> 2 MiB set for
> + * an enclave.
> + */
> +#define NE_DEFAULT_NR_2MB_MEM_REGIONS (128)
> +
> +/**
> + * NE_DEFAULT_NR_8MB_MEM_REGIONS - Default number of memory regions of
> 8 MiB set for
> + * an enclave.
> + */
> +#define NE_DEFAULT_NR_8MB_MEM_REGIONS (32)
> +
> +/**
> + * NE_DEFAULT_NR_MEM_REGIONS - Default number of memory regions of 2
> MiB and 8 MiB set for
> * an enclave.
> */
> -#define NE_DEFAULT_NR_MEM_REGIONS (256)
> +#define NE_DEFAULT_NR_MEM_REGIONS
> (NE_DEFAULT_NR_2MB_MEM_REGIONS +
> NE_DEFAULT_NR_8MB_MEM_REGIONS)
>
>
>
> Then the logic from [2] can be updated to allocate the first part, the 2
> MiB memory regions, then the second part, the 8 MiB memory regions.
>
> Open for other options as well to cover the validation for the regions
> merging case.
>
Okay, will do in the next version.
By the way, I have another question here.
The nitro_enclaves.ko consists of two parts: the misc part and the pci_dev part.
The main logic is in the misc part, the pci_dev part is responsible for the
device management and command routing.
So how about to decouple these two parts, e.g. split the nitro_enclaves.ko into
ne_ioctl.ko and ne_pdev.ko ? In this way, we can allow other modules to connect
with ne_ioctl.ko:
ne_ioctl.ko
---/ | --\
---/ | -----\
---/ | ----\
ne_pdev.ko test_pdev.ko others...
With test_pdev.ko, we can validate the core logic of the ne_ioctl.ko in local.
What's more, other vendor's device could also support software enclave feature
if they follow the specification.
> >
> > diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c
> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> > index e21e1e8..2920f26 100644
> > --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
> > +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> > @@ -824,6 +824,11 @@ static int
> ne_sanity_check_user_mem_region_page(struct ne_enclave *ne_enclave,
> > return 0;
> > }
> >
> > +struct phys_contig_mem_region {
> > + u64 paddr;
> > + u64 size;
> > +};
> > +
> > /**
> > * ne_set_user_memory_region_ioctl() - Add user space memory region to
> the slot
> > * associated with the current
> enclave.
> > @@ -843,9 +848,10 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > unsigned long max_nr_pages = 0;
> > unsigned long memory_size = 0;
> > struct ne_mem_region *ne_mem_region = NULL;
> > - unsigned long nr_phys_contig_mem_regions = 0;
> > struct pci_dev *pdev = ne_devs.ne_pci_dev->pdev;
> > - struct page **phys_contig_mem_regions = NULL;
> > + struct phys_contig_mem_region *phys_regions = NULL;
> > + unsigned long nr_phys_regions = 0;
> > + u64 prev_phys_region_end;
>
> It's indeed initialized later on and it's not used till then, but let's
> just have an init to 0 here as well.
>
Okay.
> > int rc = -EINVAL;
> >
> > rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
> > @@ -866,9 +872,8 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > goto free_mem_region;
> > }
> >
> > - phys_contig_mem_regions = kcalloc(max_nr_pages,
> sizeof(*phys_contig_mem_regions),
> > - GFP_KERNEL);
> > - if (!phys_contig_mem_regions) {
> > + phys_regions = kcalloc(max_nr_pages, sizeof(*phys_regions),
> GFP_KERNEL);
> > + if (!phys_regions) {
> > rc = -ENOMEM;
> >
> > goto free_mem_region;
> > @@ -903,25 +908,29 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> >
> > /*
> > * TODO: Update once handled non-contiguous memory
> regions
> > - * received from user space or contiguous physical memory
> regions
> > - * larger than 2 MiB e.g. 8 MiB.
> > + * received from user space.
> > */
>
> Can remove this TODO completely, similar as below.
>
Okay.
> > - phys_contig_mem_regions[i] = ne_mem_region->pages[i];
> > + if (nr_phys_regions &&
> > + prev_phys_region_end ==
> page_to_phys(ne_mem_region->pages[i]))
> > + phys_regions[nr_phys_regions - 1].size +=
> > +
> page_size(ne_mem_region->pages[i]);
>
> Please add parantheses here as well.
>
> if ... {
>
>
> } else {
>
> }
>
> > + else {
> > + phys_regions[nr_phys_regions].paddr =
> > +
> page_to_phys(ne_mem_region->pages[i]);
> > + phys_regions[nr_phys_regions].size =
> > +
> page_size(ne_mem_region->pages[i]);
> > + nr_phys_regions++;
> > + }
> > +
> > + prev_phys_region_end = phys_regions[nr_phys_regions -
> 1].paddr +
> > + phys_regions[nr_phys_regions -
> 1].size;
> >
> > memory_size += page_size(ne_mem_region->pages[i]);
> >
> > ne_mem_region->nr_pages++;
> > } while (memory_size < mem_region.memory_size);
> >
> > - /*
> > - * TODO: Update once handled non-contiguous memory regions
> received
> > - * from user space or contiguous physical memory regions larger than
> > - * 2 MiB e.g. 8 MiB.
> > - */
> > - nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
> > -
> > - if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
> > - ne_enclave->max_mem_regions) {
> > + if ((ne_enclave->nr_mem_regions + nr_phys_regions) >
> ne_enclave->max_mem_regions) {
> > dev_err_ratelimited(ne_misc_dev.this_device,
> > "Reached max memory
> regions %lld\n",
> > ne_enclave->max_mem_regions);
> > @@ -931,11 +940,8 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > goto put_pages;
> > }
> >
> > - for (i = 0; i < nr_phys_contig_mem_regions; i++) {
> > - u64 phys_region_addr =
> page_to_phys(phys_contig_mem_regions[i]);
> > - u64 phys_region_size =
> page_size(phys_contig_mem_regions[i]);
> > -
> > - if (phys_region_size & (NE_MIN_MEM_REGION_SIZE - 1)) {
> > + for (i = 0; i < nr_phys_regions; i++) {
> > + if (phys_regions[i].size & (NE_MIN_MEM_REGION_SIZE - 1)) {
> > dev_err_ratelimited(ne_misc_dev.this_device,
> > "Physical mem region
> size is not multiple of 2 MiB\n");
> >
> > @@ -944,7 +950,7 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > goto put_pages;
> > }
> >
> > - if (!IS_ALIGNED(phys_region_addr,
> NE_MIN_MEM_REGION_SIZE)) {
> > + if (!IS_ALIGNED(phys_regions[i].paddr,
> NE_MIN_MEM_REGION_SIZE)) {
> > dev_err_ratelimited(ne_misc_dev.this_device,
> > "Physical mem region
> address is not 2 MiB aligned\n");
> >
> > @@ -959,13 +965,13 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> >
> > list_add(&ne_mem_region->mem_region_list_entry,
> &ne_enclave->mem_regions_list);
> >
> > - for (i = 0; i < nr_phys_contig_mem_regions; i++) {
> > + for (i = 0; i < nr_phys_regions; i++) {
> > struct ne_pci_dev_cmd_reply cmd_reply = {};
> > struct slot_add_mem_req slot_add_mem_req = {};
> >
> > slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
> > - slot_add_mem_req.paddr =
> page_to_phys(phys_contig_mem_regions[i]);
> > - slot_add_mem_req.size =
> page_size(phys_contig_mem_regions[i]);
> > + slot_add_mem_req.paddr = phys_regions[i].paddr;
> > + slot_add_mem_req.size = phys_regions[i].size;
> >
> > rc = ne_do_request(pdev, SLOT_ADD_MEM,
> > &slot_add_mem_req,
> sizeof(slot_add_mem_req),
> > @@ -974,7 +980,7 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > dev_err_ratelimited(ne_misc_dev.this_device,
> > "Error in slot add mem
> [rc=%d]\n", rc);
> >
> > - kfree(phys_contig_mem_regions);
> > + kfree(phys_regions);
> >
> > /*
> > * Exit here without put pages as memory regions
> may
> > @@ -987,7 +993,7 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > ne_enclave->nr_mem_regions++;
> > }
> >
> > - kfree(phys_contig_mem_regions);
> > + kfree(phys_regions);
> >
> > return 0;
> >
> > @@ -995,7 +1001,7 @@ static int ne_set_user_memory_region_ioctl(struct
> ne_enclave *ne_enclave,
> > for (i = 0; i < ne_mem_region->nr_pages; i++)
> > put_page(ne_mem_region->pages[i]);
> > free_mem_region:
> > - kfree(phys_contig_mem_regions);
> > + kfree(phys_regions);
> > kfree(ne_mem_region->pages);
> > kfree(ne_mem_region);
> >
> > --
> > 1.8.3.1
> >
>
> Please also add the changelog after the "---" line in the patches so
> that we can track changes between revisions.
>
Got it, thanks.
> Thanks,
> Andra
>
> [1]
> https://github.com/aws/aws-nitro-enclaves-cli/blob/main/src/enclave_proc/resour
> ce_manager.rs#L211
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/nitr
> o_enclaves/ne_ioctl_sample.c#n817
>
>
>
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania.
> Registration number J22/2621/2005.