On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:Exactly. It's not just QEMU, but any (older) userspace manipulates mappings through the vhost-vdpa iotlb interface has to unmap all mappings to workaround the vdpa parent driver bug. If they don't do explicit unmap, it would cause state inconsistency between vhost-vdpa and parent driver, then old mappings can't be restored, and new mapping can be added to iotlb after vDPA reset. There's no point to preserve this broken and inconsistent behavior between vhost-vdpa and parent driver, as userspace doesn't care at all!
On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:The old behavior (without flag ack) cannot be trusted already, as:
Well, this is one question I've ever asked before. You have explained
On 10/12/2023 8:01 PM, Jason Wang wrote:
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:Well, in theory this seems like so but it's unnecessary code change
Devices with on-chip IOMMU or vendor specific IOTLB implementationShould we do this according to whether IOTLB_PRESIST is set?
may need to restore iotlb mapping to the initial or default state
using the .reset_map op, as it's desirable for some parent devices
to solely manipulate mappings by its own, independent of virtio device
state. For instance, device reset does not cause mapping go away on
such IOTLB model in need of persistent mapping. Before vhost-vdpa
is going away, give them a chance to reset iotlb back to the initial
state in vhost_vdpa_cleanup().
Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
---
drivers/vhost/vdpa.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 851535f..a3f8160 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
return vhost_vdpa_alloc_as(v, asid);
}
+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->reset_map)
+ ops->reset_map(vdpa, asid);
+}
+
static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
{
struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
hlist_del(&as->hash_link);
vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+ /*
+ * Devices with vendor specific IOMMU may need to restore
+ * iotlb to the initial or default state which is not done
+ * through device reset, as the IOTLB mapping manipulation
+ * could be decoupled from the virtio device life cycle.
+ */
actually, as that is the way how vDPA parent behind platform IOMMU works
today, and userspace doesn't break as of today. :)
that one of the reason that we don't break userspace is that they may
couple IOTLB reset with vDPA reset as well. One example is the Qemu.
As explained in previous threads [1][2], when IOTLB_PERSIST is not setFor two reasons:
it doesn't necessarily mean the iotlb will definitely be destroyed
across reset (think about the platform IOMMU case), so userspace today
is already tolerating enough with either good or bad IOMMU. This code of
not checking IOTLB_PERSIST being set is intentional, there's no point to
emulate bad IOMMU behavior even for older userspace (with improper
emulation to be done it would result in even worse performance).
1) backend features need acked by userspace this is by design
2) keep the odd behaviour seems to be more safe as we can't audit
every userspace program
* Devices using platform IOMMU (in other words, not implementing
neither .set_map nor .dma_map) does not unmap memory at virtio reset.
* Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do
reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb
called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset,
now all backends work the same as far as I know., which was (and is)
the way devices using the platform IOMMU works.
The difference in behavior did not matter as QEMU unmaps all the
memory unregistering the memory listener at vhost_vdpa_dev_start(...,
started = false),
but the backend acknowledging this feature flagRight, I couldn't say it better than you do, thanks! The feature flag is more of an unusual means to indicating kernel bug having been fixed, rather than introduce a new feature or new kernel behavior ending up in change of userspace's expectation.
allows QEMU to make sure it is safe to skip this unmap & map in the
case of vhost stop & start cycle.
In that sense, this feature flag is actually a signal for userspace to
know that the bug has been solved.
Not offering it indicates thatSure, will do, thank you! Will post v2 adding these to the log.
userspace cannot trust the kernel will retain the maps.
Si-Wei or Dragos, please correct me if I've missed something. Feel
free to use the text in case you find more clear in doc or patch log.
Thanks!
Thanks
I think
the purpose of the IOTLB_PERSIST flag is just to give userspace 100%
certainty of persistent iotlb mapping not getting lost across vdpa reset.
Thanks,
-Siwei
[1]
https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/
[2]
https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/
Otherwise
we may break old userspace.
Thanks
+ vhost_vdpa_reset_map(v, asid);
kfree(as);
return 0;
--
1.8.3.1