On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:Right.
Ok, so what you actually mean is that userspace can tolerate the "bug"
On 10/16/2023 7:35 PM, Jason Wang wrote:
On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:Tolerating defined as QEMU has to proactively unmap before reset just to
I'm confused, how to define tolerating here?
On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On 10/12/2023 8:01 PM, Jason Wang wrote:Well, this is one question I've ever asked before. You have explained
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:Well, in theory this seems like so but it's unnecessary code change
Devices with on-chip IOMMU or vendor specific IOTLB implementationShould we do this according to whether IOTLB_PRESIST is set?
may need to restore iotlb mapping to the initial or default state
using the .reset_map op, as it's desirable for some parent devices
to solely manipulate mappings by its own, independent of virtio device
state. For instance, device reset does not cause mapping go away on
such IOTLB model in need of persistent mapping. Before vhost-vdpa
is going away, give them a chance to reset iotlb back to the initial
state in vhost_vdpa_cleanup().
Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
---
drivers/vhost/vdpa.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 851535f..a3f8160 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
return vhost_vdpa_alloc_as(v, asid);
}
+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->reset_map)
+ ops->reset_map(vdpa, asid);
+}
+
static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
{
struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
hlist_del(&as->hash_link);
vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+ /*
+ * Devices with vendor specific IOMMU may need to restore
+ * iotlb to the initial or default state which is not done
+ * through device reset, as the IOTLB mapping manipulation
+ * could be decoupled from the virtio device life cycle.
+ */
actually, as that is the way how vDPA parent behind platform IOMMU works
today, and userspace doesn't break as of today. :)
that one of the reason that we don't break userspace is that they may
couple IOTLB reset with vDPA reset as well. One example is the Qemu.
As explained in previous threads [1][2], when IOTLB_PERSIST is not set
it doesn't necessarily mean the iotlb will definitely be destroyed
across reset (think about the platform IOMMU case), so userspace today
is already tolerating enough with either good or bad IOMMU.
workaround the driver bug (on-chip maps out of sync), unconditionally
for platform or on-chip. While we all know it doesn't have to do so for
platform IOMMU, though userspace has no means to distinguish. That said,
userspace is sacrificing reset time performance on platform IOMMU setup
just for working around buggy implementation in the other setup.
with the performance penalty.
Userspace is no buggy today, how to define "buggy"? Userspace with tolerance could survive just fine no matter if this negotiation or buggy driver behavior emulation is around or not. If any userspace doesn't tolerate, it can work still fine on good on-chip IOMMU or platform IOMMU, no matter if the negotiation is around or not.
This is not my point. I meant, we can fix we need a negotiation inFor example, if it has tolerance, why bother?I'm not sure I get the question. But I think userspace is compromising
because of buggy implementation in a few drivers doesn't mean we should
uniformly enforce such behavior for all set_map/dma_map implementations.
order to let some "buggy" old user space to survive from the changes.
I get it, but my question was that, even if the author had noticed IOTLB is cleared during reset, does he care or not to make IOTLB back working again? My point is that, if this old setup is supposed to "work" on mlx5_vdpa, then the developer must come up with sort of "quirk" to recover the IOTLB to make it back to working state again after the reset. It will be more justified to come up with the proper fix for compatibility/emulation only until we know what should be expected to work and through which possible means to making it back to work, rather than blindly emulate the buggy behavior solely based on a few driver's own implementation. I'm pretty sure there are multiple ways to implement the buggy reset behavior in the driver, does it mean we have to emulate various corrupted mapping states in the individual on-chip iommu itself? How is it able to help the developer user if we are able to replicate the same corrupted mapping state in the on-chip iommu after reset, any real-life user only cares about mapping being corrupted in the same way, rather than cares more about the quirk sequence or work around to get iotlb maps out of the broken state?
It's not hard to think of a case where:Noted, seems to me there's no such case of a userspace implementationI can easily imagine a case:This code of
not checking IOTLB_PERSIST being set is intentional, there's no point to
emulate bad IOMMU behavior even for older userspace (with improper
emulation to be done it would result in even worse performance).
The old Qemu that works only with a setup like mlx5_vdpa.
that only works with mlx5_vdpa or its friends, but doesn't work with the
others e.g. platform IOMMU, or well behaving on-chip IOMMU
implementations.
1) the environment has mlx5_vdpa only
2) kernel doc can't have endless details, so when developing
application, the author notice IOTLB is cleared during reset
We don't have to, as userspace here has no bug at all. The bug exists in the driver not in userspace. Real life userspace app only cares about making things work not asserting something must be broken.
The Unmap+remap trick around vdpa reset works totallyYes for sure, but we can't audit every user space, no?
fine for platform IOMMU, except with sub-optimal performance. Other than
this trick, I cannot easily think of other means or iotlb message
sequence for userspace to recover the bogus state and make iotlb back to
work again after reset.
Please be noted the old (broken) behavior can vary between different parent driver implementations. It's driver's specific own problem, if there are N ways to for driver to implement buggy .reset, do we have to emulate N flavors of different vdpa reset behavior?Are we talking about hypnosis that has no realInstead of trying to answer these hard questions, I would go another
basis to exist in the real world?
way. That is, stick to the old behaviour when IOTLB_PRESISIT is not
set by the backend. This is much easier.
First, the ideal fix would be to leave this reset_vendor_mappings() emulation code on the individual driver itself, which already has the broken behavior. But today there's no backend feature negotiation between vhost-vdpa and the parent driver. Do we want to send down the acked_backend_features to parent drivers?
For vhost-vDPA it's justIf we doIt's not just one line of check here, the old behavior emulation has to
this without a negotiation, IOTLB will not be clear but the Qemu will
try to re-program the IOTLB after reset. Which will break?
1) stick the exact old behaviour with just one line of check
be done as Eugenio illustrated in the other email.
if (IOTLB_PERSIST is acked by userspace)
reset_map()
For parent, it's somehow similar:
during .reset()
if (IOTLB_PERSIST is not acked by userspace)
reset_vendor_mappings()
Anything I missed here?
How come this brokenness in mlx5_vdpa becomes ABI in any sort for future on-chip IOMMU drivers? They might not even exist yet. Even if it's concerning ABI it's limited to mlx5_vdpa and the existing drivers, right?
In addition, theUnfortunately, it's a must to stick to ABI.
emulation has to limit to those buggy drivers as I don't feel this
emulation should apply uniformly to all future set_map/dma_map
implementations.
I agree it's a mess but we don't have a better choice.Well, it's your call, I can implement as you wish but the unwarranted code has to be maintained forever. Particularly without knowing if there's really such a use case in real life, and no one in future might dare to remove the code without knowing what it can be used for.
Or we can fail the probe if userspaceFail probing is even worse choice that is introducing intrusive breakage to the userspace.
doesn't ack this feature.
We don't know the use case how to make thing work instead of make thing break, that is the problem. We have no way to test if old-behaviour preserving code really works as expected. If there's no such user in practice, it ends up with dead code no one dares to remove.
This is a must as long as it can be noticed by userspace. Doing2) audit all the possible cases to avoid a one line of codeYou see it's more than just one line of code, and I'm uncertain if the
1) seems much easier than 2)
additional complexity is warranted or necessary, particularly if added
this piece of compatibility code will linger for quite a long time.
something conservative makes more sense to me.
Instead of adding hypothetical code change for no specific good reasonIt's not adding something new or new behaviours, it's just making the
and no real use case,
IOTLB reset conditional based on vDPA reset.
I'd like to add the code when we find out aIt doesn't conflict with what you proposed here. Old behaviours have
specific use case that may get impacted or already being affected,
their users, no?
I am not sure the same situation "too late to fix" or "too tricky to fix" applies here. Usually this means there's some well established pattern for e.g. API, ABI or long standing de-factor behavior that can't be broken or adjust if trying to fix something up. But here we're guarded by a flag (IOTLB_PERSIST) and without it the behavior is totally ruled by implementation.
thenThe issue is, even if we can't find a userspace now. It doesn't mean
we will have good understanding how to code up the fix and emulate
properly for compatibility, while not affecting other good implementations.
we can't have one in the future. Then it might be too late or too
tricky to fix them. We had a lot of lessons in the past.
Thanks
Thanks,
-Siwe/i/
Possibly but the point is to unbreak userspace no matter how weird theFor two reasons:The old behavior (without flag ack) cannot be trusted already, as:
1) backend features need acked by userspace this is by design
2) keep the odd behaviour seems to be more safe as we can't audit
every userspace program
behaviour we've ever had.
Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI.* Devices using platform IOMMU (in other words, not implementingExactly. It's not just QEMU, but any (older) userspace manipulates
neither .set_map nor .dma_map) does not unmap memory at virtio reset.
* Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do
reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb
called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset,
now all backends work the same as far as I know., which was (and is)
the way devices using the platform IOMMU works.
The difference in behavior did not matter as QEMU unmaps all the
memory unregistering the memory listener at vhost_vdpa_dev_start(...,
started = false),
mappings through the vhost-vdpa iotlb interface has to unmap all
mappings to workaround the vdpa parent driver bug.
If they don't doIt's a userspace notice change so we can't fix it silently:
explicit unmap, it would cause state inconsistency between vhost-vdpa
and parent driver, then old mappings can't be restored, and new mapping
can be added to iotlb after vDPA reset. There's no point to preserve
this broken and inconsistent behavior between vhost-vdpa and parent
driver, as userspace doesn't care at all!
https://lkml.org/lkml/2012/12/23/75
Another example which is related to vhost-vDPA:
https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@xxxxxxxxxx/T/
Thanks
but the backend acknowledging this feature flagRight, I couldn't say it better than you do, thanks! The feature flag is
allows QEMU to make sure it is safe to skip this unmap & map in the
case of vhost stop & start cycle.
In that sense, this feature flag is actually a signal for userspace to
know that the bug has been solved.
more of an unusual means to indicating kernel bug having been fixed,
rather than introduce a new feature or new kernel behavior ending up in
change of userspace's expectation.
Not offering it indicates thatSure, will do, thank you! Will post v2 adding these to the log.
userspace cannot trust the kernel will retain the maps.
Si-Wei or Dragos, please correct me if I've missed something. Feel
free to use the text in case you find more clear in doc or patch log.
Thanks,
-Siwei
Thanks!
Thanks
I think
the purpose of the IOTLB_PERSIST flag is just to give userspace 100%
certainty of persistent iotlb mapping not getting lost across vdpa reset.
Thanks,
-Siwei
[1]
https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/
[2]
https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/
Otherwise
we may break old userspace.
Thanks
+ vhost_vdpa_reset_map(v, asid);
kfree(as);
return 0;
--
1.8.3.1