Re: [PATCH v2 2/8] cxl/mem: Find device capabilities

From: Ben Widawsky
Date: Thu Feb 11 2021 - 12:13:54 EST


On 21-02-11 10:01:52, Jonathan Cameron wrote:
> On Wed, 10 Feb 2021 11:54:29 -0800
> Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> > > > ...
> > > >
> > > > > +static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm,
> > > > > + struct mbox_cmd *mbox_cmd)
> > > > > +{
> > > > > + struct device *dev = &cxlm->pdev->dev;
> > > > > +
> > > > > + dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n",
> > > > > + mbox_cmd->opcode, mbox_cmd->size_in);
> > > > > +
> > > > > + if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) {
> > > >
> > > > Hmm. Whilst I can see the advantage of this for debug, I'm not sure we want
> > > > it upstream even under a rather evil looking CONFIG variable.
> > > >
> > > > Is there a bigger lock we can use to avoid chance of accidental enablement?
> > >
> > > Any suggestions? I'm told this functionality was extremely valuable for NVDIMM,
> > > though I haven't personally experienced it.
> >
> > Yeah, there was no problem with the identical mechanism in LIBNVDIMM
> > land. However, I notice that the useful feature for LIBNVDIMM is the
> > option to dump all payloads. This one only fires on timeouts which is
> > less useful. So I'd say fix it to dump all payloads on the argument
> > that the safety mechanism was proven with the LIBNVDIMM precedent, or
> > delete it altogether to maintain v5.12 momentum. Payload dumping can
> > be added later.
>
> I think I'd drop it for now - feels like a topic that needs more discussion.
>
> Also, dumping this data to the kernel log isn't exactly elegant - particularly
> if we dump a lot more of it. Perhaps tracepoints?
>

I'll drop it. It's also a small enough bit to add on for developers. When I post
v3, I will add that bit on top as an RFC. My personal preference FWIW is to use
debugfs to store the payload of the last executed command.

We went with this because of the mechanism's provenance (libnvdimm)

> >
> > [..]
> > > > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> > > > > index e709ae8235e7..6267ca9ae683 100644
> > > > > --- a/include/uapi/linux/pci_regs.h
> > > > > +++ b/include/uapi/linux/pci_regs.h
> > > > > @@ -1080,6 +1080,7 @@
> > > > >
> > > > > /* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
> > > > > #define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */
> > > > > +#define PCI_DVSEC_HEADER1_LENGTH_MASK 0xFFF00000
> > > >
> > > > Seems sensible to add the revision mask as well.
> > > > The vendor id currently read using a word read rather than dword, but perhaps
> > > > neater to add that as well for completeness?
> > > >
> > > > Having said that, given Bjorn's comment on clashes and the fact he'd rather see
> > > > this stuff defined in drivers and combined later (see review patch 1 and follow
> > > > the link) perhaps this series should not touch this header at all.
> > >
> > > I'm fine to move it back.
> >
> > Yeah, we're playing tennis now between Bjorn's and Christoph's
> > comments, but I like Bjorn's suggestion of "deduplicate post merge"
> > given the bloom of DVSEC infrastructure landing at the same time.
> I guess it may depend on timing of this. Personally I think 5.12 may be too aggressive.
>
> As long as Bjorn can take a DVSEC deduplication as an immutable branch then perhaps
> during 5.13 this tree can sit on top of that.
>
> Jonathan
>
>