Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

From: Robin Murphy
Date: Thu Dec 08 2016 - 10:48:01 EST


On 08/12/16 13:36, Auger Eric wrote:
> Hi Robin,
>
> On 08/12/2016 14:14, Robin Murphy wrote:
>> On 08/12/16 09:36, Auger Eric wrote:
>>> Hi,
>>>
>>> On 15/11/2016 14:09, Eric Auger wrote:
>>>> Following LPC discussions, we now report reserved regions through
>>>> iommu-group sysfs reserved_regions attribute file.
>>>
>>>
>>> While I am respinning this series into v4, here is a tentative summary
>>> of technical topics for which no consensus was reached at this point.
>>>
>>> 1) Shall we report the usable IOVA range instead of reserved IOVA
>>> ranges. Not discussed at/after LPC.
>>> x I currently report reserved regions. Alex expressed the need to
>>> report the full usable IOVA range instead (x86 min-max range
>>> minus MSI APIC window). I think this is meaningful for ARM
>>> too where arm-smmu might not support the full 64b range.
>>> x Any objection we report the usable IOVA regions instead?
>>
>> The issue with that is that we can't actually report "the usable
>> regions" at the moment, as that involves pulling together disjoint
>> properties of arbitrary hardware unrelated to the IOMMU. We'd be
>> reporting "the not-definitely-unusable regions, which may have some
>> unusable holes in them still". That seems like an ABI nightmare - I'd
>> still much rather say "here are some, but not necessarily all, regions
>> you definitely can't use", because saying "here are some regions which
>> you might be able to use most of, probably" is what we're already doing
>> today, via a single implicit region from 0 to ULONG_MAX ;)
>>
>> The address space limits are definitely useful to know, but I think it
>> would be better to expose them separately to avoid the ambiguity. At
>> worst, I guess it would be reasonable to express the limits via an
>> "out-of-range" reserved region type for 0 to $base and $top to
>> ULONG-MAX. To *safely* expose usable regions, we'd have to start out
>> with a very conservative assumption (e.g. only IOVAs matching physical
>> RAM), and only expand them once we're sure we can detect every possible
>> bit of problematic hardware in the system - that's just too limiting to
>> be useful. And if we expose something knowingly inaccurate, we risk
>> having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and
>> nobody wants that...
> Makes sense to me. "out-of-range reserved region type for 0 to $base and
> $top to ULONG-MAX" can be an alternative to fulfill the requirement.
>>
>>> 2) Shall the kernel check collision with MSI window* when userspace
>>> calls VFIO_IOMMU_MAP_DMA?
>>> Joerg/Will No; Alex yes
>>> *for IOVA regions consumed downstream to the IOMMU: everyone says NO
>>
>> If we're starting off by having the SMMU drivers expose it as a fake
>> fixed region, I don't think we need to worry about this yet. We all seem
>> to agree that as long as we communicate the fixed regions to userspace,
>> it's then userspace's job to work around them. Let's come back to this
>> one once we actually get to the point of dynamically sizing and
>> allocating 'real' MSI remapping region(s).
>>
>> Ultimately, the kernel *will* police collisions either way, because an
>> underlying iommu_map() is going to fail if overlapping IOVAs are ever
>> actually used, so it's really just a question of whether to have a more
>> user-friendly failure mode.
> That's true on ARM but not on x86 where the APIC MSI region is not
> mapped I think.

Yes, but the APIC interrupt region is fixed, i.e. it falls under "IOVA
regions consumed downstream to the IOMMU" since the DMAR units are
physically incapable of remapping those addresses. I take "MSI window"
to mean specifically the thing we have to set aside and get a magic
remapping cookie for, which is also why it warrants its own special
internal type - we definitely *don't* want VFIO trying to set up a
remapping cookie on x86. We just don't let userspace know about the
difference, yet (if ever).

Robin.

>>> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
>>> My current series does not expose them in iommu group sysfs.
>>> I understand we can expose the RMRR regions in the iomm group sysfs
>>> without necessarily supporting RMRR requiring device assignment.
>>> We can also add this support later.
>>
>> As you say, reporting them doesn't necessitate allowing device
>> assignment, and it's information which can already be easily grovelled
>> out of dmesg (for intel-iommu at least) - there doesn't seem to be any
>> need to hide them, but the x86 folks can have the final word on that.
> agreed
>
> Thanks
>
> Eric
>>
>> Robin.
>>
>>> Thanks
>>>
>>> Eric
>>>
>>>
>>>>
>>>> Reserved regions are populated through the IOMMU get_resv_region callback
>>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>>>> arm-smmu.
>>>>
>>>> The intel-iommu reports the [FEE0_0000h - FEF0_000h] MSI window as an
>>>> IOMMU_RESV_NOMAP reserved region.
>>>>
>>>> arm-smmu reports the MSI window (arbitrarily located at 0x8000000 and
>>>> 1MB large) and the PCI host bridge windows.
>>>>
>>>> The series integrates a not officially posted patch from Robin:
>>>> "iommu/dma: Allow MSI-only cookies".
>>>>
>>>> This series currently does not address IRQ safety assessment.
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> Git: complete series available at
>>>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>>>
>>>> History:
>>>> RFC v2 -> v3:
>>>> - switch to an iommu-group sysfs API
>>>> - use new dummy allocator provided by Robin
>>>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>>>> the reserved regions
>>>> - at the moment ARM MSI base address/size is left unchanged compared
>>>> to v2
>>>> - we currently report reserved regions and not usable IOVA regions as
>>>> requested by Alex
>>>>
>>>> RFC v1 -> v2:
>>>> - fix intel_add_reserved_regions
>>>> - add mutex lock/unlock in vfio_iommu_type1
>>>>
>>>>
>>>> Eric Auger (10):
>>>> iommu/dma: Allow MSI-only cookies
>>>> iommu: Rename iommu_dm_regions into iommu_resv_regions
>>>> iommu: Add new reserved IOMMU attributes
>>>> iommu: iommu_alloc_resv_region
>>>> iommu: Do not map reserved regions
>>>> iommu: iommu_get_group_resv_regions
>>>> iommu: Implement reserved_regions iommu-group sysfs file
>>>> iommu/vt-d: Implement reserved region get/put callbacks
>>>> iommu/arm-smmu: Implement reserved region get/put callbacks
>>>> vfio/type1: Get MSI cookie
>>>>
>>>> drivers/iommu/amd_iommu.c | 20 +++---
>>>> drivers/iommu/arm-smmu.c | 52 +++++++++++++++
>>>> drivers/iommu/dma-iommu.c | 116 ++++++++++++++++++++++++++-------
>>>> drivers/iommu/intel-iommu.c | 50 ++++++++++----
>>>> drivers/iommu/iommu.c | 141 ++++++++++++++++++++++++++++++++++++----
>>>> drivers/vfio/vfio_iommu_type1.c | 26 ++++++++
>>>> include/linux/dma-iommu.h | 7 ++
>>>> include/linux/iommu.h | 49 ++++++++++----
>>>> 8 files changed, 391 insertions(+), 70 deletions(-)
>>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>