Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release
From: Eugenio Perez Martin
Date: Tue Oct 17 2023 - 10:00:13 EST
On Tue, Oct 17, 2023 at 4:35 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >
> >
> >
> > On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> > > On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> > >>>
> > >>>
> > >>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> > >>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> > >>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> > >>>>> may need to restore iotlb mapping to the initial or default state
> > >>>>> using the .reset_map op, as it's desirable for some parent devices
> > >>>>> to solely manipulate mappings by its own, independent of virtio device
> > >>>>> state. For instance, device reset does not cause mapping go away on
> > >>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> > >>>>> is going away, give them a chance to reset iotlb back to the initial
> > >>>>> state in vhost_vdpa_cleanup().
> > >>>>>
> > >>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
> > >>>>> ---
> > >>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++
> > >>>>> 1 file changed, 16 insertions(+)
> > >>>>>
> > >>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > >>>>> index 851535f..a3f8160 100644
> > >>>>> --- a/drivers/vhost/vdpa.c
> > >>>>> +++ b/drivers/vhost/vdpa.c
> > >>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> > >>>>> return vhost_vdpa_alloc_as(v, asid);
> > >>>>> }
> > >>>>>
> > >>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> > >>>>> +{
> > >>>>> + struct vdpa_device *vdpa = v->vdpa;
> > >>>>> + const struct vdpa_config_ops *ops = vdpa->config;
> > >>>>> +
> > >>>>> + if (ops->reset_map)
> > >>>>> + ops->reset_map(vdpa, asid);
> > >>>>> +}
> > >>>>> +
> > >>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> > >>>>> {
> > >>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid);
> > >>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> > >>>>>
> > >>>>> hlist_del(&as->hash_link);
> > >>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> > >>>>> + /*
> > >>>>> + * Devices with vendor specific IOMMU may need to restore
> > >>>>> + * iotlb to the initial or default state which is not done
> > >>>>> + * through device reset, as the IOTLB mapping manipulation
> > >>>>> + * could be decoupled from the virtio device life cycle.
> > >>>>> + */
> > >>>> Should we do this according to whether IOTLB_PRESIST is set?
> > >>> Well, in theory this seems like so but it's unnecessary code change
> > >>> actually, as that is the way how vDPA parent behind platform IOMMU works
> > >>> today, and userspace doesn't break as of today. :)
> > >> Well, this is one question I've ever asked before. You have explained
> > >> that one of the reason that we don't break userspace is that they may
> > >> couple IOTLB reset with vDPA reset as well. One example is the Qemu.
> > >>
> > >>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> > >>> it doesn't necessarily mean the iotlb will definitely be destroyed
> > >>> across reset (think about the platform IOMMU case), so userspace today
> > >>> is already tolerating enough with either good or bad IOMMU.
>
> I'm confused, how to define tolerating here? For example, if it has
> tolerance, why bother?
>
> > >>This code of
> > >>> not checking IOTLB_PERSIST being set is intentional, there's no point to
> > >>> emulate bad IOMMU behavior even for older userspace (with improper
> > >>> emulation to be done it would result in even worse performance).
>
> I can easily imagine a case:
>
> The old Qemu that works only with a setup like mlx5_vdpa.
I think it is a fair point, but QEMU in particular already unmapped
everything before set_status(0). Other userspace apps that have
trusted vdpa_sim and/or mlx5 behavior could fail, yes.
> If we do
> this without a negotiation, IOTLB will not be clear but the Qemu will
> try to re-program the IOTLB after reset. Which will break?
>
> 1) stick the exact old behaviour with just one line of check
> 2) audit all the possible cases to avoid a one line of code
>
> 1) seems much easier than 2)
>
> > >> For two reasons:
> > >>
> > >> 1) backend features need acked by userspace this is by design
> > >> 2) keep the odd behaviour seems to be more safe as we can't audit
> > >> every userspace program
> > >>
> > > The old behavior (without flag ack) cannot be trusted already, as:
>
> Possibly but the point is to unbreak userspace no matter how weird the
> behaviour we've ever had.
>
> > > * Devices using platform IOMMU (in other words, not implementing
> > > neither .set_map nor .dma_map) does not unmap memory at virtio reset.
> > > * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do
> > > reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb
> > > called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset,
> > > now all backends work the same as far as I know., which was (and is)
> > > the way devices using the platform IOMMU works.
> > >
> > > The difference in behavior did not matter as QEMU unmaps all the
> > > memory unregistering the memory listener at vhost_vdpa_dev_start(...,
> > > started = false),
> > Exactly. It's not just QEMU, but any (older) userspace manipulates
> > mappings through the vhost-vdpa iotlb interface has to unmap all
> > mappings to workaround the vdpa parent driver bug.
>
> Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI.
>
> > If they don't do
> > explicit unmap, it would cause state inconsistency between vhost-vdpa
> > and parent driver, then old mappings can't be restored, and new mapping
> > can be added to iotlb after vDPA reset. There's no point to preserve
> > this broken and inconsistent behavior between vhost-vdpa and parent
> > driver, as userspace doesn't care at all!
>
> It's a userspace notice change so we can't fix it silently:
>
> https://lkml.org/lkml/2012/12/23/75
>
> Another example which is related to vhost-vDPA:
>
> https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@xxxxxxxxxx/T/
>
So let's say it's just a matter of checking if IOTLB_PERSIST has been
acked and then call vhost_vdpa_reset_map in set_status(0) as long as
the backend uses .set_map or .dma_map. Both mlx5 and vdpa_sim will
have old behavior, but future parent drivers that use (.set_map ||
.dma_map) will also reset map with old qemu.
I think it is acceptable. Am I missing something?
Thanks!
> Thanks
>
> >
> > > but the backend acknowledging this feature flag
> > > allows QEMU to make sure it is safe to skip this unmap & map in the
> > > case of vhost stop & start cycle.
> > >
> > > In that sense, this feature flag is actually a signal for userspace to
> > > know that the bug has been solved.
> > Right, I couldn't say it better than you do, thanks! The feature flag is
> > more of an unusual means to indicating kernel bug having been fixed,
> > rather than introduce a new feature or new kernel behavior ending up in
> > change of userspace's expectation.
> >
> > > Not offering it indicates that
> > > userspace cannot trust the kernel will retain the maps.
> > >
> > > Si-Wei or Dragos, please correct me if I've missed something. Feel
> > > free to use the text in case you find more clear in doc or patch log.
> > Sure, will do, thank you! Will post v2 adding these to the log.
> >
> > Thanks,
> > -Siwei
> >
> >
> >
> > >
> > > Thanks!
> > >
> > >> Thanks
> > >>
> > >>> I think
> > >>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100%
> > >>> certainty of persistent iotlb mapping not getting lost across vdpa reset.
> > >>>
> > >>> Thanks,
> > >>> -Siwei
> > >>>
> > >>> [1]
> > >>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/
> > >>> [2]
> > >>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/
> > >>>> Otherwise
> > >>>> we may break old userspace.
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>>> + vhost_vdpa_reset_map(v, asid);
> > >>>>> kfree(as);
> > >>>>>
> > >>>>> return 0;
> > >>>>> --
> > >>>>> 1.8.3.1
> > >>>>>
> >
>