Re: [PATCH] vfio/iommu_type1: report the IOMMU aperture info

From: Alex Williamson
Date: Fri Dec 01 2017 - 11:22:37 EST


On Fri, 1 Dec 2017 10:38:07 +0100
Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote:

> On 30/11/2017 19:30, Alex Williamson wrote:
> > On Thu, 30 Nov 2017 16:11:35 +0100
> > Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote:
> >
> >> On 30/11/2017 15:08, Alex Williamson wrote:
> >>> On Thu, 30 Nov 2017 12:34:38 +0100
> >>> Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>>> When userland VFIO defines a new IOMMU for a guest it may
> >>>> want to specify to the guest the physical limits of
> >>>> the underlying host IOMMU to avoid access to forbidden
> >>>> memory ranges.
> >>>>
> >>>> Currently, the vfio_iommu_type1 driver does not report this
> >>>> information to userland.
> >>>>
> >>>> Let's extend the vfio_iommu_type1_info structure reported
> >>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the
> >>>> IOMMU limits as new uint64_t entries aperture_start and
> >>>> aperture_end.
> >>>>
> >>>> Let's also extend the flags bit map to add a flag specifying
> >>>> if this extension of the info structure is reported or not.
> >>>>
> >>>> Signed-off-by: Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx>
> >>>> ---
> >>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++
> >>>> include/uapi/linux/vfio.h | 3 +++
> >>>> 2 files changed, 45 insertions(+)
> >>>>
> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>>> index 8549cb1..7da5fe0 100644
> >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> >>>> return ret;
> >>>> }
> >>>>
> >>>> +/**
> >>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu
> >>>> + * @iommu: the current vfio_iommu
> >>>> + * @start: a pointer to the aperture start
> >>>> + * @end : a pointer to the aperture end
> >>>> + *
> >>>> + * This function iterate on the domains using the given vfio_iommu
> >>>> + * and restrict the aperture to the minimal aperture common
> >>>> + * to all domains sharing this vfio_iommu.
> >>>> + */
> >>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start,
> >>>> + uint64_t *end)
> >>>> +{
> >>>> + struct iommu_domain_geometry geometry;
> >>>> + struct vfio_domain *domain;
> >>>> +
> >>>> + *start = 0;
> >>>> + *end = U64_MAX;
> >>>> +
> >>>> + mutex_lock(&iommu->lock);
> >>>> + /* loop on all domains using this vfio_iommu */
> >>>> + list_for_each_entry(domain, &iommu->domain_list, next) {
> >>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
> >>>> + &geometry);
> >>>> + if (geometry.force_aperture) {
> >>>> + if (geometry.aperture_start > *start)
> >>>> + *start = geometry.aperture_start;
> >>>> + if (geometry.aperture_end < *end)
> >>>> + *end = geometry.aperture_end;
> >>>> + }
> >>>> + }
> >>>> + mutex_unlock(&iommu->lock);
> >>>> +}
> >>>> +
> >>>> static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>> unsigned int cmd, unsigned long arg)
> >>>> {
> >>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> >>>>
> >>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >>>>
> >>>> + minsz = min_t(size_t, info.argsz, sizeof(info));
> >>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info,
> >>>> + aperture_end)) {
> >>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE;
> >>>> + vfio_get_aperture(iommu, &info.aperture_start,
> >>>> + &info.aperture_end);
> >>>> + }
> >>>> +
> >>>> return copy_to_user((void __user *)arg, &info, minsz) ?
> >>>> -EFAULT : 0;
> >>>>
> >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>>> index 0fb25fb..780d909 100644
> >>>> --- a/include/uapi/linux/vfio.h
> >>>> +++ b/include/uapi/linux/vfio.h
> >>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info {
> >>>> __u32 flags;
> >>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
> >>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */
> >>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */
> >>>> + __u64 aperture_start; /* start of DMA aperture */
> >>>> + __u64 aperture_end; /* end of DMA aperture */
> >>>> };
> >>>>
> >>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >>>
> >>> This only supports the most simple topology, even x86 cannot claim to
> >>> have a single contiguous aperture, it's typically bisected by an MSI
> >>> window. I think we need an API that supports one or more apertures
> >>> out of the box. Also as Eric indicates, a capability is probably the
> >>> better option for creating a flexible structure. Thanks,
> >>>
> >>> Alex
> >>>
> >>
> >>
> >> Yes, I understand that a capability here is a must, I will follow this way.
> >>
> >> For having multiple aperture and MSI protection, I understood it was
> >> done using windows and reserved regions.
> >> Can you point me to my error?
> >
> > See the thread from Huawei, I don't think that's a solved problem:
> >
> > https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html
> >
> > If you want sysfs to be consumed separately by the user and fed into
> > new QEMU command line options for creating a VM layout, perhaps that's
> > sufficient, but I think the vfio api for the iommu should encompass
> > describing available ranges of mappable iova space without cobbling
> > together arbitrary info from sysfs. Thanks,
> >
> > Alex
> >
>
> Hi Alex,
>
> I resume to see if I understood you well:
>
> We may have physical IOMMUs with a more complex access that can not be
> specified by only defining the start and end of a read/write region.
>
> Windows can be used to reserve regions for the VM but it is not what we
> want. What we want is to know what the host can offer which is a mix of
> aperture and windows.
>
> To report this we can use capabilities in a positive way, describing
> what the host offers not what it can not provide.
>
> To achieve this we have to use two interfaces:
> - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities
> - Physical IOMMU interface with both geometry and window iommu_ops
> callbacks.
>
> If it is sufficiently near from what you thought I will provide a new
> version in this direction.

I believe so. VFIO would construct a set of mappable iova
regions/windows using information provided via the IOMMU API via
iommu_ops and expose this via a new capability supporting multiple such
regions via the VFIO_IOMMU_GET_INFO ioctl. This ioctl would be
extended to support capabilities in the same way we've done so for
other vfio ioctls. Thanks,

Alex