On Thu, Oct 19, 2023 at 10:27 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:I am afraid it's not just new parent drivers, but any well behaved driver today may well break userspace if go with this forced emulation code, if they have to implement reset_map for some reason (e.g. restored to 1:1 passthrough mapping or other default state in mapping). For new userspace and user driver we can guard against it using the IOTLB_PERSIST flag, but the above code would get a big chance to break setup with good driver and older userspace in practice.
On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:To avoid new parent drivers
I'm not saying it's not an issue, what I'm saying is, if the fix
On 10/18/2023 7:53 PM, Jason Wang wrote:
On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:Hold on, I thought earlier we all agreed upon that the existing behavior
So the point is, not about whether the existing behavior is "broken"
On 10/18/2023 12:00 AM, Jason Wang wrote:
Please see my earlier response in the other email, thanks.Unfortunately, it's a must to stick to ABI. I agree it's a mess but weAntoher idea we can just do the following in vhost_vdpa reset?
don't have a better choice. Or we can fail the probe if userspace
doesn't ack this feature.
config->reset()
if (IOTLB_PERSIST is not set) {
config->reset_map()
}
Then we don't have the burden to maintain them in the parent?
Thanks
----------------%<----------------%<----------------
First, the ideal fix would be to leave this reset_vendor_mappings()
emulation code on the individual driver itself, which already has the
broken behavior.
or not.
of vendor driver self-clearing maps during .reset violates the vhost
iotlb abstraction and also breaks the .set_map/.dma_map API. This is
100% buggy driver implementation itself that we should discourage or
eliminate as much as possible (that's part of the goal for this series),
breaks another userspace, it's a new bug in the kernel. See what Linus
said in [1]
"If a change results in user programs breaking, it's a bug in the kernel."
but here you seem to go existentialism and suggests the very oppositeSuch "emulation" is not done at the parent level. New parents just
that every .set_map/.dma_map driver implementation, regardless being the
current or the new/upcoming, should unconditionally try to emulate the
broken reset behavior for the sake of not breaking older userspace.
need to implement reset_map() or not. everything could be done inside
vhost-vDPA as pseudo code that is shown above.
SetI'm not sure I get the point of this question. I'm not saying we don't
aside the criteria and definition for how userspace can be broken, can
we step back to the original question why we think it's broken, and what
we can do to promote good driver implementation instead of discuss the
implementation details?
need to fix, what I am saying is that such a fix must be done in a
negotiable way. And it's better if parents won't get any burden. It
can just decide to implement reset_map() or not.
Reading the below response I found my majorI try my best to not ignore any important things, but I can't promise
points are not heard even if written for quite a few times.
I will not miss any. I hope the above clarifies my points.
It's notWell let me do it once again, reset_map() is not mandated:
that I don't understand the importance of not breaking old userspace, I
appreciate your questions and extra patience, however I do feel the
"broken" part is very relevant to our discussion here.
If it's broken (in the sense of vhost IOTLB API) that you agree, I think
we should at least allow good driver implementations; and when you think
about the possibility of those valid good driver cases
(.set_map/.dma_map implementations that do not clear maps in .reset),
you might be able to see why it's coded the way as it is now.
It's about whether we could stick to the old behaviour withoutImplementation issue: this implies reset_map() has to be there for every
too much cost. And I believe we could.
And just to clarify here, reset_vendor_mappings() = config->reset_map()
But today there's no backend feature negotiationThere's no need to do that with the above code, or anything I missed here?
between vhost-vdpa and the parent driver. Do we want to send down the
acked_backend_features to parent drivers?
config->reset()
if (IOTLB_PERSIST is not set) {
config->reset_map()
}
.set_map implementations, but vendor driver implementation for custom
IOMMU could well implement DMA ops by itself instead of .reset_map. This
won't work for every set_map driver (think about the vduse case).
config->reset()
if (IOTLB_PERSIST is not set) {
if (config->reset_map)
config->reset_map()
to have this behavior if they need toRight, separating out the need for old behavior emulation from .reset_map is much cleaner, and this is what individual broken driver has to maintain without penalizing other good drivers. Good to see what I said earlier is heard.
implement reset_map,
What if we add a new callback like "config->buggy_virtio_reset_map",
different from regular reset_map callback at cleanup?
Only mlx5 andLet's see what Jason will say about it and try to converge on this first, I think he seemed to imply that this is part of ABI that every driver has to make compromise for. I'd better get it ack'ed before proceeding to the rest.
vdpa_sim need to implement it, with a big warning, and new parent
drivers can trust they'll never have the old bad behavior.
}
Did you see any issue with VDUSE in this case?
But this is not the the point I was making. I think if you agree this isAs I pointed out, if it is not noticeable in the userspace, that's
purely buggy driver implementation of its own, we should try to isolate
this buggy behavior to individual driver rather than overload vhost-vdpa
or vdpa core's role to help implement the emulation of broken driver
behavior.
fine but it's not.
I don't get why .reset is special here, the abuse of .reset toWho can abuse reset here? It is totally under the control of
manipulate mapping could also happen in other IOMMU unrelated driver
entries like in .suspend, or in queue_reset.
vhost-vDPA and it's not visible to uAPI. And we can fully control the
behaviour of vhost-vDPA.
If someday userspace isI think so, have you seen the links I give you? If you want to go
found coded around similar buggy driver implementation in other driver
ops, do we want to follow and duplicate the same emulation in vdpa core
as the precedent is already set here around .reset?
through the one from Linus thread[1], you can see the one that unbreak
virtio-IOMMU[2]:
1) Someday, we spot invalidate with size 0 is a bug
2) We fix this bug by not allowing this
3) But virtio-IOMMU userspace find that size 0 actually clean all the
IOTLB so it depends on the behaviour
4) So the virtio-IOMMU userspace find it can't work after 2)
5) Then we recover the behaviour before 2) via [2]
Another example is the IOTLB_MSG_V2, V1 suffers from in-stable ABI in
32bit archs, most of the userspace survives since it never runs on
32bit archs. The fix is to introduce a V2 but we will stick to V1 by
default if V2 is not acknowledged by the userspace.
I think the above 2 examples are sufficient for us to understand the
case. If not, I can help to clarify more since I'm involved in those 2
fixes.
The buggy driver can fail in a lot of other ways indefinitely duringWithout IOTLB_PRESIST it doesn't break. With IOTLB_PERSIST and if the
reset, if there's a buggy driver that's already broken the way as how it
is and happens to survive with all userspace apps, we just don't care
and let it be.
reset_map() is done unconditionally, it can break. That's my point.
There's no way we can enumerate all those buggy behaviorsIf it is not noticeable by userspace, we can do any fix at will. But
in .reset_map itself, it's overloading that driver API too much.
it is not, we don't have another choice. Especially considering the
cost is rather low.
I would like to know why this can't work in this case:Think about the vduse case, it can work with DMA ops directly so doesn'tSecond, IOTLB_PERSIST is needed but not sufficient. Due to lack ofThen we just don't offer IOTLB_PRESIST, isn't this by design?
backend feature negotiation in parent driver, if vhost-vdpa has to
provide the old-behaviour emulation for compatibility on driver's
behalf, it needs to be done per-driver basis. There could be good
on-chip or vendor IOMMU implementation which doesn't clear the IOTLB in
.reset, and vendor specific IOMMU doesn't have to provide .reset_map,
have to implement .reset_map, unless for some specific good reason.
Because it's a conforming and valid/good driver implementation, we may
still allow it to advertise IOTLB_PERSIST to userspace.
config->reset()
if (IOTLB_PERSIST is not set) {
if (config->reset_map)
config->reset_map()
}
Which belongs toPlease read the link [1], Linus has explained it.
the 3rd bullet below:
https://lore.kernel.org/virtualization/1696928580-7520-4-git-send-email-si-wei.liu@xxxxxxxxxx/
There are 3 cases that backend may claim this feature bit on:
- parent device that has to work with platform IOMMU
- parent device with on-chip IOMMU that has the expected
.reset_map support in driver
- parent device with vendor specific IOMMU implementation
that explicitly declares the specific backend feature
This is exactly what I was afraid of that broken behavior emulation mayweThen you can force reset_map() with set_map() that is what I suggest
should allow these good driver implementations rather than
unconditionally stick to some specific problematic behavior for every
other good driver.
in another thread, no?
become a dangerous slippery slope - in principle we should encourage
good driver implementation, as they can work totally fine with older
userspace. Why do they have to bother emulating broken behavior just
because some other driver's misbehaving?
And what's the boundary forThe boundary is whether the behaviour change could be noticed but
this hack, do drivers backed by platform IOMMU even have to emulate (if
not why not, and is there substantial difference in between)?
userspace. And I've shown you it's not a burden with the pseudo codes.
If not, please explain why.
AfterThe truth is that bugs exist everywhere. We can't promise there's no
getting through all of this, do you still believe everything is just as
easy and simple as what thought to be?
bug when developing an uAPI or subsystem. For kernel code, the bug
that touches uAPI might be fixed in a way that doesn't break existing
userspace. If you look at how downstream to maintain kABI, you will be
supersized furtherly.
Btw, I thought I was expecting but still haven't got the clear answersThe code is pretty simple. But yes, the time spent on justifying it
to what was the goal to do all this, we spent a lot of time trying to
unbreak userspace,
might take some time. That's the community. People need time to
understand each other's points.
but looks to me as if we were trying every possibleHow could my suggestions break a userspace?
way to break userspace
or try to approximate to the same brokennessSiwei, I'd really suggest you read the link I gave you. You may get
mlx5_vdpa may have caused to the userspace. What we will get eventually
from these lengthy discussions?
the answer. What's more, It doesn't cost too much then we know for
sure there would not be any issue, why not choose the hard way?
On the other hand, if you think it fromPatches are more than welcomed.
vhost-vdpa user perspective, you'll clearly see there's just a couple of
ways to unbreak userspace from the internal broken map which is out of
sync with vhost-vdpa iotlb after device reset.
If this brokenness wasSorry? I start with a switch in the driver, and then I try to avoid
something universally done from the vhost-vdpa layer itself, I'd feel
it's more of a shared problem, but this is not the case I see it here.
While the long standing mlx5_vdpa/vdpa_sim issue is 100% misuse of
.reset op in a wrong way per IOMMU API definition. Why leaving this
discrepancy to the individual driver is not even an option, I'm still
not sure?
that. And it seems you don't want a burden on the driver as well.
Where did you see I say we can't do that in the driver? What I
disagree with is to use a module parameter.
Even if I fail, it doesn't mean we can't do that in the driver code.
If you read the link[1] you can see the offending commit is a change
in uvcvideo driver.
Thanks
Thanks,
-Siwei
Then we need a set of device flags (backend_featuresSee above, for reset_vendor_mappings() I meant config->reset_map() exactly.
bit again?) to indicate the specific driver needs upper layer's help on
old-behaviour emulation.
Last but not least, I'm not sure how to properly emulate
reset_vendor_mappings() from vhost-vdpa layer. If a vendor driver has no
.reset_map op implemented, or if .reset_map has a slightly different
implementation than what it used to reset the iotlb in the .reset op,
Thanks
then this either becomes effectively dead code if no one ends up using,
or the vhost-vdpa emulation is helpless and limited in scope, unable to
cover all the cases.
----------------%<----------------%<----------------