Re: [RFC PATCH] iommu/arm-smmu: Add module parameter to set msi iova address

From: Auger Eric
Date: Thu May 28 2020 - 09:11:39 EST


Hi Shameer,

On 5/28/20 2:09 PM, Shameerali Kolothum Thodi wrote:
>
>
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
>> Sent: 28 May 2020 12:48
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
>> Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
>> Cc: Robin Murphy <robin.murphy@xxxxxxx>; Joerg Roedel
>> <joro@xxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; Linux Kernel Mailing
>> List <linux-kernel@xxxxxxxxxxxxxxx>; Alex Williamson
>> <alex.williamson@xxxxxxxxxx>; Srinath Mannam
>> <srinath.mannam@xxxxxxxxxxxx>; BCM Kernel Feedback
>> <bcm-kernel-feedback-list@xxxxxxxxxxxx>; Will Deacon <will@xxxxxxxxxx>;
>> Linux ARM <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
>> Subject: Re: [RFC PATCH] iommu/arm-smmu: Add module parameter to set msi
>> iova address
>>
>>
>>
>> On 5/28/20 11:15 AM, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
>>>> Sent: 28 May 2020 09:54
>>>> To: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
>>>> Cc: Will Deacon <will@xxxxxxxxxx>; Joerg Roedel <joro@xxxxxxxxxx>;
>>>> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; Shameerali Kolothum Thodi
>>>> <shameerali.kolothum.thodi@xxxxxxxxxx>; Linux Kernel Mailing List
>>>> <linux-kernel@xxxxxxxxxxxxxxx>; Alex Williamson
>>>> <alex.williamson@xxxxxxxxxx>; Srinath Mannam
>>>> <srinath.mannam@xxxxxxxxxxxx>; BCM Kernel Feedback
>>>> <bcm-kernel-feedback-list@xxxxxxxxxxxx>; Robin Murphy
>>>> <robin.murphy@xxxxxxx>; Linux ARM
>> <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
>>>> Subject: Re: [RFC PATCH] iommu/arm-smmu: Add module parameter to set
>> msi
>>>> iova address
>>>>
>>>> Hi,
>>>>
>>>> On 5/28/20 10:38 AM, Jean-Philippe Brucker wrote:
>>>>> [+ Shameer]
>>>>>
>>>>> On Thu, May 28, 2020 at 09:43:46AM +0200, Auger Eric wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 5/28/20 9:23 AM, Jean-Philippe Brucker wrote:
>>>>>>> On Thu, May 28, 2020 at 10:45:14AM +0530, Srinath Mannam wrote:
>>>>>>>> On Wed, May 27, 2020 at 11:00 PM Robin Murphy
>>>> <robin.murphy@xxxxxxx> wrote:
>>>>>>>>>
>>>>>>>> Thanks Robin for your quick response.
>>>>>>>>> On 2020-05-27 17:03, Srinath Mannam wrote:
>>>>>>>>>> This patch gives the provision to change default value of MSI IOVA
>> base
>>>>>>>>>> to platform's suitable IOVA using module parameter. The present
>>>>>>>>>> hardcoded MSI IOVA base may not be the accessible IOVA ranges of
>>>> platform.
>>>>>>>>>
>>>>>>>>> That in itself doesn't seem entirely unreasonable; IIRC the current
>>>>>>>>> address is just an arbitrary choice to fit nicely into Qemu's memory
>>>>>>>>> map, and there was always the possibility that it wouldn't suit
>>>> everything.
>>>>>>>>>
>>>>>>>>>> Since commit aadad097cd46 ("iommu/dma: Reserve IOVA for PCIe
>>>> inaccessible
>>>>>>>>>> DMA address"), inaccessible IOVA address ranges parsed from
>>>> dma-ranges
>>>>>>>>>> property are reserved.
>>>>>>>
>>>>>>> I don't understand why we only reserve the PCIe windows for DMA
>>>> domains.
>>>>>>> Shouldn't VFIO also prevent userspace from mapping them?
>>>>>>
>>>>>> VFIO prevents userspace from DMA mapping iovas within reserved
>> regions:
>>>>>> 9b77e5c79840 vfio/type1: check dma map request is within a valid iova
>>>> range
>>>>>
>>>>> Right but I was asking specifically about the IOVA reservation introduced
>>>>> by commit aadad097cd46. They are not registered as reserved regions
>> within
>>>>> the IOMMU core, they are only taken into account by dma-iommu.c when
>>>>> creating a DMA domain. As VFIO uses UNMANAGED domains, it isn't
>> aware
>>>> of
>>>>> those regions and they won't be seen by vfio_iommu_resv_exclude().
>>>>>
>>>>> It looks like the PCIe regions used to be common until cd2c9fcf5c66
>>>>> ("iommu/dma: Move PCI window region reservation back into dma specific
>>>>> path.") But I couldn't find the justification for this commit.
>>>>
>>>> Yes I noticed that as well when debugging the above mentioned case
>>>> before and after cd2c9fcf5c66. I do not remember about the rationale of
>>>> removing the DMA host brige windows from the resv regions. Did it break
>>>> a legacy case?
>>>>>
>>>
>>> I think yes. And going through the ML discussions, this was done so because
>> with the
>>> " vfio/type1: Add support for valid iova list management" series you reported
>>> an issue with Seattle platform. See the full discussion here,
>>>
>>> https://lore.kernel.org/patchwork/patch/889012/
>>
>> Hey thank you for reminding me of the Seattle case :-) Now I also recall
>> that, if I am not wrong, this also caused some trouble on some x86
>> platforms as well, reported by Alex?
>
> True, Alex reported that VT-d RMRR ranges were causing issues[1] as well.
> And then you came with IOMMU_RESV_DIRECT_RELAXABLE regions
> to exclude those[2]
I thought we also had the case of RESERVED regions but anyway.
>
> Maybe we should still report PCI
>> host bridge windows in the reserved regions, if possible/feasible tag
>> them differently from other reserved regions and not reject any VFIO
>> DMA_MAP colliding with them?
>
> I guess that is possible. But current interface is to report the regions that are safe
> from a IOMMU transaction point of view and I am not sure PCI window regions
> comes under that.
yes only the sysfs interface could expose them at the moment.

Thanks

Eric
>
> Thanks,
> Shameer
>
> 1. https://lkml.org/lkml/2018/6/5/760
> 2. https://lore.kernel.org/patchwork/cover/1083072/
>
>> Thanks
>>
>> Eric
>>>
>>> Cheers,
>>> Shameer
>>>
>>>>> The thing is, if VFIO isn't aware of the reserved PCIe windows, then
>>>>> allowing VFIO or userspace to choose MSI_IOVA_BASE won't solve the
>>>> problem
>>>>> reported by Srinath, because they could well choose an IOVA within the
>>>>> PCIe window...
>>>> I agree with you
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>>
>>>>> Thanks,
>>>>> Jean
>>>>>
>>>>>> but it does not prevent the SW MSI region chosen by the kernel from
>>>>>> colliding with other reserved regions (esp. PCIe host bridge windows).
>>>>>>
>>>>>> If they were
>>>>>>> part of the common reserved regions then we could have VFIO choose a
>>>>>>> SW_MSI region among the remaining free space.
>>>>>> As Robin said this was the initial chosen approach
>>>>>> [PATCH 10/10] vfio: allow the user to register reserved iova range for
>>>>>> MSI mapping
>>>>>> https://patchwork.kernel.org/patch/8121641/
>>>>>>
>>>>>> Some additional background about why the static SW MSI region chosen
>> by
>>>>>> the kernel was later chosen:
>>>>>> Summary of LPC guest MSI discussion in Santa Fe (was: Re: [RFC 0/8] KVM
>>>>>> PCIe/MSI passthrough on ARM/ARM64 (Alt II))
>>>>>>
>>>>
>> https://lists.linuxfoundation.org/pipermail/iommu/2016-November/019060.ht
>>>> ml
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>>
>>>>>> It would just need a
>>>>>>> different way of asking the IOMMU driver if a SW_MSI is needed, for
>>>>>>> example with a domain attribute.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jean
>>>>>>>
>>>>>>>>>
>>>>>>>>> That, however, doesn't seem to fit here; iommu-dma maps MSI
>>>> doorbells
>>>>>>>>> dynamically, so they aren't affected by reserved regions any more
>> than
>>>>>>>>> regular DMA pages are. In fact, it explicitly ignores the software MSI
>>>>>>>>> region, since as the comment says, it *is* the software that manages
>>>> those.
>>>>>>>> Yes you are right, we don't see any issues with kernel drivers(PCI EP)
>>>> because
>>>>>>>> MSI IOVA allocated dynamically by honouring reserved regions same as
>>>> DMA pages.
>>>>>>>>>
>>>>>>>>> The MSI_IOVA_BASE region exists for VFIO, precisely because in that
>>>> case
>>>>>>>>> the kernel *doesn't* control the address space, but still needs some
>> way
>>>>>>>>> to steal a bit of it for MSIs that the guest doesn't necessarily know
>>>>>>>>> about, and give userspace a fighting chance of knowing what it's
>> taken.
>>>>>>>>> I think at the time we discussed the idea of adding something to the
>>>>>>>>> VFIO uapi such that userspace could move this around if it wanted or
>>>>>>>>> needed to, but decided we could live without that initially. Perhaps
>> now
>>>>>>>>> the time has come?
>>>>>>>> Yes, we see issues only with user-space drivers(DPDK) in which
>>>> MSI_IOVA_BASE
>>>>>>>> region is considered to map MSI registers. This patch helps us to fix the
>>>> issue.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Srinath.
>>>>>>>>>
>>>>>>>>> Robin.
>>>>>>>>>
>>>>>>>>>> If any platform has the limitaion to access default MSI IOVA, then it
>> can
>>>>>>>>>> be changed using "arm-smmu.msi_iova_base=0xa0000000"
>> command
>>>> line argument.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Srinath Mannam <srinath.mannam@xxxxxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>> drivers/iommu/arm-smmu.c | 5 ++++-
>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/iommu/arm-smmu.c
>> b/drivers/iommu/arm-smmu.c
>>>>>>>>>> index 4f1a350..5e59c9d 100644
>>>>>>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>>>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>>>>>>> @@ -72,6 +72,9 @@ static bool disable_bypass =
>>>>>>>>>> module_param(disable_bypass, bool, S_IRUGO);
>>>>>>>>>> MODULE_PARM_DESC(disable_bypass,
>>>>>>>>>> "Disable bypass streams such that incoming transactions
>> from
>>>> devices that are not attached to an iommu domain will report an abort back
>> to
>>>> the device and will not be allowed to pass through the SMMU.");
>>>>>>>>>> +static unsigned long msi_iova_base = MSI_IOVA_BASE;
>>>>>>>>>> +module_param(msi_iova_base, ulong, S_IRUGO);
>>>>>>>>>> +MODULE_PARM_DESC(msi_iova_base, "msi iova base address.");
>>>>>>>>>>
>>>>>>>>>> struct arm_smmu_s2cr {
>>>>>>>>>> struct iommu_group *group;
>>>>>>>>>> @@ -1566,7 +1569,7 @@ static void
>>>> arm_smmu_get_resv_regions(struct device *dev,
>>>>>>>>>> struct iommu_resv_region *region;
>>>>>>>>>> int prot = IOMMU_WRITE | IOMMU_NOEXEC |
>>>> IOMMU_MMIO;
>>>>>>>>>>
>>>>>>>>>> - region = iommu_alloc_resv_region(MSI_IOVA_BASE,
>>>> MSI_IOVA_LENGTH,
>>>>>>>>>> + region = iommu_alloc_resv_region(msi_iova_base,
>>>> MSI_IOVA_LENGTH,
>>>>>>>>>> prot,
>>>> IOMMU_RESV_SW_MSI);
>>>>>>>>>> if (!region)
>>>>>>>>>> return;
>>>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> linux-arm-kernel mailing list
>>>>>>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> linux-arm-kernel mailing list
>>>>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>