Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

From: Leon Romanovsky
Date: Thu Mar 18 2021 - 13:36:16 EST


On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > Leon Romanovsky <leon@xxxxxxxxxx> wrote:

<...>

> > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > >
> > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > some sort of DB to create second policy, because "supported != working".
> > > > What am I missing?
> > > >
> > > > Thanks
> > > >
> > > Can you explain bit more about why supported != working?
> >
> > It is written in the commit message of this patch.
> > https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@xxxxxxxxx/
> > "This feature aims to allow greater control of a device for use cases
> > as device assignment, where specific device or platform issues may
> > interact poorly with a given reset method, and for which device specific
> > quirks have not been developed."
> >
> > You wrote it and also repeated it a couple of times during the discussion.
> >
> > If device can understand that specific reset doesn't work, it won't
> > perform it in first place.
> >
> > Thanks
> Is it possible for device to understand whether or not specific reset
> will work or not prior to performing reset and after it indicates
> support for that reset method? Maybe theres problem with that particular
> piece of hardware in that machine.
> How can database be maintained if a particular machines have
> particular piece of faulty HW?

It was exactly the reason why I think that VM usecase presented by
you is not viable.

> If for some reason reset doesn't work it will just give -ENOTTY.
> This isn't any different from existing behavior.Actually it informs user
> that the reset method didn't reset the device and user can use different
> reset method instead of implicitly using different reset method.
> If user doesn't explicitly set preferred reset method then
> we go ahead with existing implicit fall through behavior which will try all
> available reset methods until any one of them works.
> If you have device that doesn't support reset at all then you have
> option to completely disable it unlike existing reset attribute where
> you cannot disable reset. So it gives greater control where you can
> disable the reset altogether when quirk isn't developed yet.

I explicitly asked to hear usecase, right now, I got an explanation from
Alex for policy decision (which doesn't need sysfs) and from you about
overcoming HW bugs with expectation that user will be guru of PCI reset
methods.

>
> We can't expect to develop quirk for every device in existence.

It doesn't give us an excuse do not try.

> For example on my laptop elantech touchpad still doesn't work in 2021
> with vanilla kernel, arch linux applies the patch which was reverted in
> mainline kernel for some reason.

I see it as a good example of cheap solution. Vendor won't fix your
touchpad because distros provide workaround. The same will be with reset.

Thanks

>
> Thanks,
> Amey