Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism

From: Amey Narkhede
Date: Sun Mar 21 2021 - 10:59:12 EST


On 21/03/21 10:40AM, Leon Romanovsky wrote:
> On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > On Sat, 20 Mar 2021 11:10:08 +0200
> > Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > >
> > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > time a device specific method is disabled?
> > >
> > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> >
> > "none" as implemented in this patch, clearing the enabled function
> > reset methods.
>
> It is far from intuitive, the empty string will be easier to understand,
> because "none" means no reset at all.
>
> >
> > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > altogether, but for example should a device specific reset that fixes
> > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > case if direct FLR were disabled via a device flag introduced with the
> > > > quirk and the remaining resets can still be selected by preference.
> > >
> > > I don't know enough to discuss the PCI details, but you raised good point.
> > > This sysfs is user visible API that is presented as is from device point
> > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > user's choice.
> > >
> > > >
> > > > Theoretically all the other reset methods work and are available, it's
> > > > only a policy decision which to use, right?
> > >
> > > But this patch was presented as a way to overcome situations where
> > > supported != working and user magically knows which reset type to set.
> >
> > It's not magic, the new sysfs attributes expose which resets are
> > enabled and the order that they're used, the user can simply select the
> > next one. Being able to bypass a broken reset method is a helpful side
> > effect of getting to select a preferred reset method.
>
> Magic in a sense that user has no idea what those resets mean, the
> expectation is that he will blindly iterate till something works.
>
> >
> > > If you want to take this patch to be policy decision tool,
> > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > so fallback will work natively.
> >
> > I don't see that as a requirement. We have fall-through support in the
> > kernel, but for a given device we're really only ever going to make use
> > of one of those methods. If a user knows enough about a device to have
> > a preference, I think it can be singular. That also significantly
> > simplifies the interface and supporting code. Thanks,
>
> I'm struggling to get requirements from this thread. You talked about
> policy decision to overtake fallback mechanism, Amey wanted to avoid
> quirks.
Just to clarify I don't want to avoid quirks. I just want device
to be usable even if it doesn't have quirk as the quirk for that
particular device may not be developed at all for different reasons
mentioned earlier.
[...]

Thanks,
Amey