On Fri, 11 Dec 2020 10:04:43 -0500
Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote:
On 12/11/20 10:01 AM, Matthew Rosato wrote:
On 12/11/20 9:35 AM, Cornelia Huck wrote:
Let me summarize this to make sure I understand this new region
correctly:
- some devices may have relaxed alignment/length requirements for
pcistb (and friends?)
The relaxed alignment bit is really specific to PCISTB behavior, so the
"and friends" doesn't apply there.
Ok.
- some devices may actually require writes to be done in a large chunk
instead of being broken up (is that a strict subset of the devices
above?)
Yes, this is specific to ISM devices, which are always a relaxed
alignment/length device.
The inverse is an interesting question though (relaxed alignment devices
that are not ISM, which you've posed as a possible future extension for
emulated devices). I'm not sure that any (real devices) exist where
(relaxed_alignment && !ism), but 'what if' -- I guess the right approach
would mean additional code in QEMU to handle relaxed alignment for the
vfio mmio path as well (seen as pcistb_default in my qemu patchset) and
being very specific in QEMU to only enable the region for an ism device.
Let me be more precise there... It would be additional code to handle
relaxed alignment for the default pcistb path (pcistb_default) which
would include BOTH emulated devices (should we ever surface the relaxed
alignment CLP bit and the guest kernel honor it) as well as any s390x
vfio-pci device that doesn't use this new I/O region described here.
Understood. Not sure if it is useful, but the important part is that we
could extend it if we wanted.
- some devices do not support the new MIO instructions (is that a
subset of the relaxed alignment devices? I'm not familiar with the
MIO instructions)
The non-MIO requirement is again specific to ISM, which is a subset of
the relaxed alignment devices. In this case, the requirement is not
limited to PCISTB, and that's why PCILG is also included here. The ISM
driver does not use PCISTG, and the only PCISTG instructions coming from
the guest against an ISM device would be against the config space and
those are OK to go through vfio still; so what was provided via the
region is effectively the bare-minimum requirement to allow ISM to
function properly in the guest.
The patchsets introduce a new region that (a) is used by QEMU to submit
writes in one go, and (b) makes sure to call into the non-MIO
instructions directly; it's basically killing two birds with one stone
for ISM devices. Are these two requirements (large writes and non-MIO)
always going hand-in-hand, or is ISM just an odd device?
I would say that ISM is definitely a special-case device, even just
looking at the way it's implemented in the base kernel (interacting
directly with the s390 kernel PCI layer in order to avoid use of MIO
instructions -- no other driver does this). But that said, having the
two requirements hand-in-hand I think is not bad, though -- This
approach ensures the specific instruction the guest wanted (or in this
case, needed) is actually executed on the underlying host.
The basic question I have is whether it makes sense to specialcase the
ISM device (can we even find out that we're dealing with an ISM device
here?) to force the non-MIO instructions, as it is just a device with
some special requirements, or tie non-MIO to relaxed alignment. (Could
relaxed alignment devices in theory be served by MIO instructions as
well?)
Another thing that came to my mind is whether we consider the guest to
be using a pci device and needing weird instructions to do that because
it's on s390, or whether it is issuing instructions for a device that
happens to be a pci device (sorry if that sounds a bit meta :)
That said, the ability to re-use the large write for other devices would
be nice -- but as hinted in the QEMU cover letter, this approach only
works because ISM does not support MSI-X; using this approach for
MSI-X-enabled devices breaks the MSI-X masking that vfio-pci does in
QEMU (I tried an approach that used this region approach for all 3
instructions as a test, PCISTG/PCISTB/PCILG, and threw it against mlx --
any writes against an MSI-X enabled bar will miss the msi-x notifiers
since we aren't performing memory operations against the typical
vfio-pci bar).
Ugh. I wonder why ISM is so different from anything else.
If there's an expectation that the new region will always use the
non-MIO instructions (in addition to the changed write handling), it
should be noted in the description for the region as well.
Yes, this is indeed the expectation; I can clarify that.
Thanks!