Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

From: Amey Narkhede
Date: Fri Mar 19 2021 - 11:24:56 EST


On 21/03/19 03:05PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 11:13:44PM +0530, Amey Narkhede wrote:
> > On 21/03/18 07:35PM, Leon Romanovsky wrote:
> > > On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > > > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > > > Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > >
> > > <...>
> > >
> > > > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > > > >
> > > > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > > > some sort of DB to create second policy, because "supported != working".
> > > > > > > What am I missing?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > Can you explain bit more about why supported != working?
> > > > >
> > > > > It is written in the commit message of this patch.
> > > > > https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@xxxxxxxxx/
> > > > > "This feature aims to allow greater control of a device for use cases
> > > > > as device assignment, where specific device or platform issues may
> > > > > interact poorly with a given reset method, and for which device specific
> > > > > quirks have not been developed."
> > > > >
> > > > > You wrote it and also repeated it a couple of times during the discussion.
> > > > >
> > > > > If device can understand that specific reset doesn't work, it won't
> > > > > perform it in first place.
> > > > >
> > > > > Thanks
> > > > Is it possible for device to understand whether or not specific reset
> > > > will work or not prior to performing reset and after it indicates
> > > > support for that reset method? Maybe theres problem with that particular
> > > > piece of hardware in that machine.
> > > > How can database be maintained if a particular machines have
> > > > particular piece of faulty HW?
> > >
> > > It was exactly the reason why I think that VM usecase presented by
> > > you is not viable.
> > >
> > Well I didn't present it as new use case. I just gave existing
> > usecase based on existing reset attribute. Nothing new here.
> > Nothing really changes wrt that use case.
>
> Of course it is new, please see Alex's response, he said that vfio uses
> in-kernel API and not sysfs.
>
Still it doesn't change in-kernel API either.
> > > > If for some reason reset doesn't work it will just give -ENOTTY.
> > > > This isn't any different from existing behavior.Actually it informs user
> > > > that the reset method didn't reset the device and user can use different
> > > > reset method instead of implicitly using different reset method.
> > > > If user doesn't explicitly set preferred reset method then
> > > > we go ahead with existing implicit fall through behavior which will try all
> > > > available reset methods until any one of them works.
> > > > If you have device that doesn't support reset at all then you have
> > > > option to completely disable it unlike existing reset attribute where
> > > > you cannot disable reset. So it gives greater control where you can
> > > > disable the reset altogether when quirk isn't developed yet.
> > >
> > > I explicitly asked to hear usecase, right now, I got an explanation from
> > > Alex for policy decision (which doesn't need sysfs) and from you about
> > > overcoming HW bugs with expectation that user will be guru of PCI reset
> > > methods.
> > >
> > > >
> > > > We can't expect to develop quirk for every device in existence.
> > >
> > > It doesn't give us an excuse do not try.
> > >
> > > > For example on my laptop elantech touchpad still doesn't work in 2021
> > > > with vanilla kernel, arch linux applies the patch which was reverted in
> > > > mainline kernel for some reason.
> > >
> > > I see it as a good example of cheap solution. Vendor won't fix your
> > > touchpad because distros provide workaround. The same will be with reset.
> > >
> > > Thanks
> > >
> > As mentioned earlier not all vendors care about Linux and not
> > all of the population can afford to buy new HW just to run Linux.
>
> Sorry, but you are not consistent. At the beginning, we talked about new HW
> that has bugs but don't have quirks yet. Here we are talking about old HW
> that still doesn't have quirks.
>
> Thanks
>
Does it really matter whether HW is old or new?
If old HW doesn't have quirks yet how can we expect
new one to have quirks? What if new HW is made by same vendors
who don't have any interest in Linux?

Thanks,
Amey