Re: [PATCH V11 5/8] cxl/port: Read CDAT table
From: Dan Williams
Date: Tue Jun 21 2022 - 17:48:53 EST
Ira Weiny wrote:
> On Fri, Jun 17, 2022 at 05:43:34PM -0700, Dan Williams wrote:
> > ira.weiny@ wrote:
> > > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> > >
>
> [snip]
>
> > > +
> > > +/**
> > > + * read_cdat_data - Read the CDAT data on this port
> > > + * @port: Port to read data from
> > > + *
> > > + * This call will sleep waiting for responses from the DOE mailbox.
> > > + */
> > > +void read_cdat_data(struct cxl_port *port)
> > > +{
> > > + static struct pci_doe_mb *cdat_mb;
> > > + struct device *dev = &port->dev;
> > > + struct device *uport = port->uport;
> > > + size_t cdat_length;
> > > + int ret;
> > > +
> > > + /*
> > > + * Ensure a reference on the underlying uport device which has the
> > > + * mailboxes in it
> > > + */
> > > + get_device(uport);
> >
> > I had written up a long description grumbling about "just in case"
> > acquistions of device references, but then I lost that draft.
> >
> > So I'll do the shorter response, but give you more homework in the
> > process. How / Why is that get_device() needed? What are the guarantees that
> > ensure you that the last reference has not been dropped just before that
> > call? Hint: what context is this code running?
>
> I'll check it out. I suspect there is some reference on uport already taken
> such that if port is valid uport must be valid.
Right, this routine is called from cxl_port_probe() which holds the
device_lock() and prevents the port from being unregistered before it
has gone through device_release_driver(). The port was registered by the
driver for @uport i.e. cxl_mem. That driver is guaranteed to unregister
the port before it is unregistered itself.
As long as you know that the code path is within the lifespan of
cxl_port_probe() and cxl_port_remove() you are guaranteed to not need to
manage reference counts on the associated cxl_memdev.
>
> >
> > > +
> > > + cdat_mb = find_cdat_mb(uport);
> > > + if (!cdat_mb) {
> > > + dev_dbg(dev, "No CDAT mailbox\n");
> > > + goto out;
> > > + }
> > > +
> > > + if (cxl_cdat_get_length(dev, cdat_mb, &cdat_length)) {
> > > + dev_dbg(dev, "No CDAT length\n");
> > > + goto out;
> > > + }
> > > +
> > > + port->cdat.table = devm_kzalloc(dev, cdat_length, GFP_KERNEL);
> > > + if (!port->cdat.table)
> > > + goto out;
> > > +
> > > + port->cdat.length = cdat_length;
> > > + ret = cxl_cdat_read_table(dev, cdat_mb, &port->cdat);
> > > + if (ret) {
> > > + /* Don't leave table data allocated on error */
> > > + devm_kfree(dev, port->cdat.table);
> > > + port->cdat.table = NULL;
> > > + port->cdat.length = 0;
> > > + dev_err(dev, "CDAT data read error\n");
> >
> > Rather than a chatty / ephemeral error message I think this wants some
> > indication in userspace, likely the 0-length CDAT binary attribute, so
> > that userspace can debug why the kernel is picking sub-optimal QTG ids
> > for newly provisioned CXL regions.
>
> I thought we agreed that 0-length or CDAT query failure would result in no
> sysfs entry?
Oh, I forgot about that, but some new rationale below...
>
> This message was to alert that a CDAT query was attempted but the read failed
> vs finding no mailbox with CDAT capabilities for example.
...right, but that's an error message buried in the kernel log. I was
hoping for something where tooling can query and say "oh, by the way,
the driver tried and failed to get CDAT from this device that claimed to
support CDAT, remedy that situation if you are seeing unexpected
performance / behavior".
>
> [snip]
>
> > >
> > > +static ssize_t cdat_read(struct file *filp, struct kobject *kobj,
> > > + struct bin_attribute *bin_attr, char *buf,
> > > + loff_t offset, size_t count)
> > > +{
> > > + struct device *dev = kobj_to_dev(kobj);
> > > + struct cxl_port *port = to_cxl_port(dev);
> > > +
> > > + if (!port->cdat.table)
> > > + return 0;
> > > +
> > > + return memory_read_from_buffer(buf, count, &offset,
> > > + port->cdat.table,
> > > + port->cdat.length);
> > > +}
> > > +
> > > +static BIN_ATTR_RO(cdat, 0);
> >
> > This should be BIN_ATTR_ADMIN_RO(), see:
> >
> > 3022c6a1b4b7 driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
>
> Are you suggesting I add BIN_ATTR_ADMIN_* macros?
Yes.
>
> >
> > > +
> > > +static umode_t cxl_port_bin_attr_is_visible(struct kobject *kobj,
> > > + struct bin_attribute *attr, int i)
> > > +{
> > > + struct device *dev = kobj_to_dev(kobj);
> > > + struct cxl_port *port = to_cxl_port(dev);
> > > +
> > > + if ((attr == &bin_attr_cdat) && port->cdat.table)
> > > + return 0400;
> >
> > Per above change you only need to manage visibility and not permissions,
>
> But the permissions indicate visibility (In the kdoc for struct
> attribute_group).
>
>
> * ... Must
> * return 0 if a binary attribute is not visible. The returned
> * value will replace static permissions defined in
> * struct bin_attribute.
>
> And the value returned overrides the mode.
>
> fs/sysfs/group.c:
>
> create_files()
>
> 82 if (grp->is_bin_visible) {
> 83 mode = grp->is_bin_visible(kobj, *bin_attr, i);
> 84 if (!mode)
> 85 continue;
> 86 }
> 87
> 88 WARN(mode & ~(SYSFS_PREALLOC | 0664),
> 89 "Attribute %s: Invalid permissions 0%o\n",
> 90 (*bin_attr)->attr.name, mode);
> 91
> 92 mode &= SYSFS_PREALLOC | 0664;
>
>
> So I'm willing to add the macro but I'm not sure it is going to change anything
> in this case.
The change I was expecting is that with BIN_ATTR_ADMIN_RO() this
implementation changes from:
if ((attr == &bin_attr_cdat) && port->cdat.table)
return 0400;
...to:
if ((attr == &bin_attr_cdat) && port->cdat.table)
return attr->mode;
...i.e. this routine only modifies visibility, you do not also need it
to enforce the root-read-only permission change since that's already
statically defined at attribute creation time.
> I think to make those _ADMIN_ macros work with is_visible()
> create_files() needs to be changed. :-/ I'm not sure if the addition of
> DEVICE_ATTR_ADMIN_{RO,RW} intended for is_visible() to be able to override the
> mode?
The intent was that one only needs to look in one place to read the
permission, and is_visible() is (mostly*) only left to change the mode to
0.
* changes from read-only to/from writable would still need is_visble()
to manipulate permissions, but you get the idea.