Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

From: Amey Narkhede
Date: Thu Mar 25 2021 - 12:28:23 EST


On 21/03/25 10:37AM, Leon Romanovsky wrote:
> On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 17:13:56 +0200
> > Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> <...>
>
> > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > Everything else is waste of time.
> >
> > Sorry, this is nonsense. Allowing users to debug issues without a full
> > kernel rebuild is a good thing.
>
> It is far from debug, this interface doesn't give you any answers why
> the reset didn't work, it just helps you to find the one that works.
>
> Unless you believe that this information will be enough to understand
> the root cause, you will need to ask from the user to perform extra
> tests, maybe try some quirk. All of that requires from the users to
> rebuild their kernel.
>
> So no, it is not debug.
>
> >
> > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > system policy to do that same thing.
> > > > > >
> > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > the risks.
> > > > >
> > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > >
> > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > by default.
> > > >
> > > > Of course we're proposing it to be accessible, it should also require
> > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > relegated to non-default accessibility, it won't be used for testing
> > > > and it won't be available for system policy and it's pointless.
> > >
> > > We probably have difference in view of what testing is. I expect from
> > > the users who experience issues with reset to do extra steps and one of
> > > them is to require from them to compile their kernel.
> >
> > I would define the ability to generate a CI test that can pick a
> > device, unbind it from its driver, and iterate reset methods as a
> > worthwhile improvement in testing.
>
> Who is going to run this CI? At least all kernel CIs (external and
> internal to HW vendors) that I'm familiar are building kernel themselves.
>
> Distro kernel is too bloat to be really usable for CI.
>
> >
> > > The root permissions doesn't protect from anything, SO lovers will use
> > > root without even thinking twice.
> >
> > Yes, with great power comes great responsibility. Many admins ignore
> > this. That's far beyond the scope of this series.
>
> <...>
>
> > > I'm trying to help you with your use case of providing reset policy
> > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > to continue path of having specific reset type only, please ensure
> > > that this is not taken to the "bypass kernel" direction.
> >
> > You've lost me, are you saying you'd be in favor of an interface that
> > allows an admin to specify an arbitrary list of reset methods because
> > that's somehow more in line with a policy choice than a userspace
> > workaround? This seems like unnecessary bloat because (a) it allows
> > the same bypass mechanism, and (b) a given device is only going to use
> > a single method anyway, so the functionality is unnecessary. Please
> > help me understand how this favors the policy use case. Thanks,
>
> The policy decision is global logic that is easier to grasp. At some
> point of our discussion, you presented the case where PM reset is not
> defined well and you prefer to do bus reset (something like that).
>
> I expect that QEMU sets same reset policy for all devices at the same
> time instead of trying per-device to guess which one works.
>
The current reset attribute does the same thing internally you described
at the end.
> And yes, you will be able to bypass kernel, but at least this interface
> will be broader than initial one that serves only SO and workarounds.
>
What does it mean by "bypassing" kernel?
I don't see any problem with SO and workaround if that is the only
way an user can use their device. Why are you expecting every vendor to
develop quirk? Also I don't see any point of using linked list to
unnecessarily complicate a simple thing.

Thanks,
Amey