Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

From: Erik Skultety
Date: Tue May 14 2019 - 06:58:59 EST


On Tue, May 14, 2019 at 11:51:35AM +0200, Cornelia Huck wrote:
> On Tue, 14 May 2019 03:47:36 -0400
> Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
>
> > On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
>
> > > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > > > errors (I believe Alex has mentioned something similar in one of his responses
> > > > > in one of the threads):
> > > > > a) read error indicating that an mdev type doesn't support migration
> > > > > - I assume if one type doesn't support migration, none of the other
> > > > > types exposed on the parent device do, is that a fair assumption?
>
> Probably; but there might be cases where the migratability depends not
> on the device type, but how the partitioning has been done... or is
> that too contrived?

No, you have a point - once again I let my thoughts be carried away by the idea
of heterogeneous setups, which is a discussion for another time anyway, I was
just thinking out loud.

>
> > > > > b) write error indicating that the mdev types are incompatible for
> > > > > migration
> > > > >
> > > > > Regards,
> > > > > Erik
> > > > Thanks for this explanation.
> > > > so, can we arrive at below agreements?
> > > >
> > > > 1. "not to define the specific errno returned for a specific situation,
> > > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > > read indicates the device does not support migration version comparison and
> > > > that an errno on write indicates the devices are incompatible or the target
> > > > doesn't support migration versions. "
> > > > 2. vendor driver should log detailed error reasons in kernel log.
> > >
> > > That would be my take on this, yes, but I open to hear any other suggestions and
> > > ideas I couldn't think of as well.
>
> So, read to find out whether migration is supported at all, write to
> find out whether it is supported for that concrete pairing is
> reasonable for libvirt?

Yes, more specifically, in the prepare phase of migration, we'd retrieve the
string (potentially reporting an error like: "Failed to query migration
support: <errno translation>"), put the string into the migration cookie and
do the check with write on destination. The only thing is that if the error is
on the destination, the error message in kernel log lives only on the
destination, which doesn't help libvirt users, so it would require setting up
remote logging, but for layered products, this is not a problem since those
already utilize central logging nodes.

Then there are the libvirt-specific bits out of scope of this discussion,
whether we should only assume identical mdev type pairs, or whether we should
employ best effort approach and iterate over all the available types exposed by
the vendor and check whether any of the types would support this migration
(back to your note Connie, partitioning would come into the picture here).


>
> > >
> > > Erik
> > got it. thanks a lot!
> >
> > hi Cornelia and Dave,
> > do you also agree on:
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.
>
> Two questions:
> - How reasonable is it to refer to the system log in order to find out
> what exactly went wrong?
> - If detailed error reporting is basically done to the syslog, do
> different error codes still provide useful information? Or should the
> vendor driver decide what it wants to do?

I'd leave anything beyond returning -1 on read/write from/to the sysfs to the
vendor driver, as user space has no control over it, even if there was a
facility to interpret different return codes for us, I'm not sure (in this
migration-related case) how much would userspace be able to recover or
fallback anyway, you either can or cannot migrate smoothely.

Regards,
Erik