RE: [PATCH] PCI/IOV: Expose error return to dmesg

From: Liming Wu
Date: Tue Dec 13 2022 - 21:42:58 EST




> -----Original Message-----
> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Tuesday, December 13, 2022 7:43 PM
> To: Liming Wu <liming.wu@xxxxxxxxxxxxxxx>
> Cc: bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; alex.williamson@xxxxxxxxxx; 398776277@xxxxxx
> Subject: Re: [PATCH] PCI/IOV: Expose error return to dmesg
>
> On Tue, Dec 13, 2022 at 11:33:27AM +0000, Liming Wu wrote:
> > HI,
> >
> > Thanks for review it.
> >
> > > -----Original Message-----
> > > From: Leon Romanovsky <leon@xxxxxxxxxx>
> > > Sent: Tuesday, December 13, 2022 5:02 PM
> > > To: Liming Wu <liming.wu@xxxxxxxxxxxxxxx>
> > > Cc: bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux-
> > > kernel@xxxxxxxxxxxxxxx; alex.williamson@xxxxxxxxxx; 398776277@xxxxxx
> > > Subject: Re: [PATCH] PCI/IOV: Expose error return to dmesg
> > >
> > > On Tue, Dec 13, 2022 at 04:16:07PM +0800, Liming Wu wrote:
> > > > There are many errors returned during the initialization of sriov,
> > > > such as -EIO/-ENOMEM, but they are not exposed to dmesg.
> > > > Let's expose the real errors to the user.
> > >
> > > Please provide motivation. It is pretty easy to see what went wrong
> > > even without info print in dmesg.
> > The background is that we use our smat nic in the ARM64 architecture
> > server The following code in the sriov_init() function threw an
> > exception
> >
> > if (resource_size(res) & (PAGE_SIZE - 1)) {
> >
> > The resource size obtained from smat nic is 4096(it's incorrectly set to a fixed
> value in nic).
> > But the PAGE_SIZE is 65536,
> > so sriov_init() exits, but the relevant exception information is not found in
> dmesg.
>
> It is not motivation, but summarizing HW bug found during device bringup.
> Why should we (as users) see this print in upstream kernel?
Agreed.

> >
> > > >
> > > > In addition, -ENODEV doesn't make much sense and is not returned
> > > > just like any other capabilities.
> > > >
> > > > Signed-off-by: Liming Wu <liming.wu@xxxxxxxxxxxxxxx>
> > > > ---
> > > > drivers/pci/iov.c | 9 ++++++---
> > > > drivers/pci/pci.h | 2 --
> > > > drivers/pci/probe.c | 6 +++++-
> > > > 3 files changed, 11 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > > > 952217572113..519aa2b48236 100644
> > > > --- a/drivers/pci/iov.c
> > > > +++ b/drivers/pci/iov.c
> > > > @@ -767,8 +767,11 @@ static int sriov_init(struct pci_dev *dev, int pos)
> > > > pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
> > > >
> > > > pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &total);
> > > > - if (!total)
> > > > + if (!total) {
> > > > + pci_info(dev, "SR-IOV capability is enabled but has %d VFs)\n",
> > > > + total);
> > >
> > > total is always 0 in this print.
> > Spec describe PCI_SRIOV_TOTAL_VF reg (Total Virtual Functions) as below:
> > Indicates the maximum number of Virtual Functions (VFs) that can be
> > associated With the Physical Function (PF).
> > This values is HWInit in Single Root mode and must contain the same
> > values as InitialVFs In Multi-Root mode, the Multi-Root PCI Manager(MR-PCIM)
> can change this values.
> >
> > I don't think total is always 0 in this print for it has been confirmed to have SR-
> IOV capability Enabled.
>
> You added print under if(!total) condition. The "SR-IOV capability is enabled but
> has %d VFs" will always print "SR-IOV capability is enabled but has 0 VFs"
Yes, I just want to show this to users. VFs of 0 is not normal.
A message is displayed indicating that the hardware Settings may be incorrect.

> >
> > My arm64 Server dmesg as follow:
> > # dmesg -T |grep -B 1 -i total_vf
> > [Tue Dec 13 04:02:34 2022] pci 0000:07:00.0: reg 0x18: [mem
> > 0x80001c00000-0x80001c00fff 64bit pref] [Tue Dec 13 04:02:34 2022]
> > drivers/pci/iov.c: 632 [info]: read PCI_SRIOV_TOTAL_VF 255
> > --
> > [Tue Dec 13 04:02:34 2022] pci 0000:08:00.0: reg 0x18: [mem
> > 0x80001a00000-0x80001a00fff 64bit pref] [Tue Dec 13 04:02:34 2022]
> > --
> > [Tue Dec 13 04:02:34 2022] pci 0000:dd:00.0: reg 0x18: [mem
> > 0x400120000000-0x4001200fffff 64bit pref] [Tue Dec 13 04:02:34 2022]
> > drivers/pci/iov.c: 632 [info]: read PCI_SRIOV_TOTAL_VF 0
> >
> > >
> > > > return 0;
> > > > + }
> > > >
> > > > pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, &pgsz);
> > > > i = PAGE_SHIFT > 12 ? PAGE_SHIFT - 12 : 0; @@ -899,13 +902,13 @@
> > > > int pci_iov_init(struct pci_dev *dev)
> > > > int pos;
> > > >
> > > > if (!pci_is_pcie(dev))
> > > > - return -ENODEV;
> > > > + return;
> > >
> > > Please at least compile patches before you send them.
> > OK, thanks.
> > How about return 0, or any other suggestions.
>
> Drop the patch and leave this code as is.
OK, Thanks