Re: [PATCH v7 4/8] PCI/sysfs: Allow userspace to query and set device reset mechanism

From: Amey Narkhede
Date: Thu Jun 24 2021 - 13:28:14 EST


On 21/06/24 11:56AM, Bjorn Helgaas wrote:
> On Thu, Jun 24, 2021 at 08:42:42PM +0530, Amey Narkhede wrote:
> > On 21/06/24 07:15AM, Bjorn Helgaas wrote:
> > > On Tue, Jun 08, 2021 at 11:18:53AM +0530, Amey Narkhede wrote:
> > > > Add reset_method sysfs attribute to enable user to
> > > > query and set user preferred device reset methods and
> > > > their ordering.
> > >
> > > > + Writing the name or comma separated list of names of any of
> > > > + the device supported reset methods to this file will set the
> > > > + reset methods and their ordering to be used when resetting
> > > > + the device.
> > >
> > > > + while ((name = strsep(&options, ",")) != NULL) {
> > > > + if (sysfs_streq(name, ""))
> > > > + continue;
> > > > +
> > > > + name = strim(name);
> > > > +
> > > > + for (i = 0; i < PCI_RESET_METHODS_NUM; i++) {
> > > > + if (reset_methods[i] &&
> > > > + sysfs_streq(name, pci_reset_fn_methods[i].name)) {
> > > > + reset_methods[i] = prio--;
> > > > + break;
> > > > + }
> > > > + }
> > > > +
> > > > + if (i == PCI_RESET_METHODS_NUM) {
> > > > + kfree(options);
> > > > + return -EINVAL;
> > > > + }
> > > > + }
> > >
> > > Asking again since we didn't get this clarified before. The above
> > > tells me that "reset_methods" allows the user to control the
> > > *order* in which we try reset methods.
> > >
> > > Consider the following two uses:
> > >
> > > (1) # echo bus,flr > reset_methods
> > >
> > > (2) # echo flr,bus > reset_methods
> > >
> > > Do these have the same effect or not?
> > >
> > They have different effect.
>
> I asked about this because Shanker's idea [1] of using two bitmaps
> only keeps track of which resets are *enabled*. It does not keep
> track of the *ordering*. Since you want to control the ordering, I
> think we need more state than just the supported/enabled bitmaps.
>
> > > If "reset_methods" allows control over the order, I expect them to
> > > be different: (1) would try a bus reset and, if the bus reset
> > > failed, an FLR, while (2) would try an FLR and, if the FLR failed,
> > > a bus reset.
> >
> > Exactly you are right.
> >
> > Now the point I was presenting was with new encoding we have to
> > write list of *all of the supported reset methods* in order for
> > example, in above example flr,bus or bus,flr. We can't just write
> > 'flr' or 'bus' then switch back to writing flr,bus/bus,flr (these
> > have different effect as mentioned earlier).
>
> It sounds like you're saying this sequence can't work:
>
> # echo flr > reset_methods
# dev->reset_methods = [3, 0, 0, ..]
> # echo bus,flr > reset_methods
# to get dev->reset_methods = [6, 3, 0, ...]
we'll need to probe reset methods here.
>
> But I'm afraid you'll have to walk me through the reasons why this
> can't be made to work.
I wrote incomplete description. It can work but we'll need to probe
everytime which involves reading different capabilities(PCI_CAP_ID_AF,
PCI_PM_CTRL etc) from device. With current encoding we just have to
probe at the begining.
>
> > Basically with new encoding an user can't write subset of reset
> > methods they have to write list of *all* supported methods
> > everytime.
>
> Why does the user have to write all supported methods? Is that to
> preserve the fact that "cat reset_methods" always shows all the
> supported methods so the user knows what's available?
>
> I'm wondering why we can't do something like this (pidgin code):
>
> if (option == "default") {
> pci_init_reset_methods(dev);
> return;
> }
>
> n = 0;
> foreach method in option {
> i = lookup_reset_method(method);
> if (pci_reset_methods[i].reset_fn(dev, PROBE) == 0)
Repeatedly calling probe might have some impact as it involves reading
device registers as explained earlier.
> dev->reset_methods[n++] = i; # method i supported
> }
> dev->reset_methods[n++] = 0; # end of supported methods
>
> If we did something like the above, the user could always find the
> list of all methods supported by a device by doing this:
>
> # echo default > reset_methods
> # cat reset_methods
>
This is one solution for current problem with new encoding.
> Yes, this does call the "probe" methods several times. I don't think
> that's necessarily a problem.
I thought this would be a problem because of your earlier suggestion
of caching flr capability to avoid probing multiple times. In this case
we'll need to read different device registers multiple times. With
current encoding we don't have to do that multiple times.

Thanks,
Amey
>
> Bjorn
>
> [1] https://lore.kernel.org/r/1fb0a184-908c-5f98-ef6d-74edc602c2e0@xxxxxxxxxx