Re: [PATCH V7 3/5] platform/x86: Intel PMT class driver

From: Alexander Duyck
Date: Thu Oct 01 2020 - 13:44:49 EST


On Thu, Oct 1, 2020 at 9:26 AM Andy Shevchenko
<andy.shevchenko@xxxxxxxxx> wrote:
>
> On Thu, Oct 1, 2020 at 4:43 AM David E. Box <david.e.box@xxxxxxxxxxxxxxx> wrote:
> >
> > From: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>
> >
> > Intel Platform Monitoring Technology is meant to provide a common way to
> > access telemetry and system metrics.
> >
> > Register mappings are not provided by the driver. Instead, a GUID is read
> > from a header for each endpoint. The GUID identifies the device and is to
> > be used with an XML, provided by the vendor, to discover the available set
> > of metrics and their register mapping. This allows firmware updates to
> > modify the register space without needing to update the driver every time
> > with new mappings. Firmware writes a new GUID in this case to specify the
> > new mapping. Software tools with access to the associated XML file can
> > then interpret the changes.
>
> Where one may find a database of these reserved GUIDs / XMLs?
> How do you prevent a chaos which happens with other registries?

The database will be posted on intel.com eventually. Although I don't
believe the URL is public yet.

> > The module manages access to all Intel PMT endpoints on a system,
> > independent of the device exporting them. It creates an intel_pmt class to
> > manage the devices. For each telemetry endpoint, sysfs files provide GUID
> > and size information as well as a pointer to the parent device the
> > telemetry came from. Software may discover the association between
> > endpoints and devices by iterating through the list in sysfs, or by looking
> > for the existence of the class folder under the device of interest. A
> > binary sysfs attribute of the same name allows software to then read or map
> > the telemetry space for direct access.
>
> What are the security implications by direct access?

In this case minimal as it would really be no different than the read.
The registers in the memory regions themselves are read-only with no
read side effects.

> ...
>
> > +static const struct pci_device_id pmt_telem_early_client_pci_ids[] = {
> > + { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */
> > + { }
> > +};
> > +bool intel_pmt_is_early_client_hw(struct device *dev)
> > +{
> > + struct pci_dev *parent = to_pci_dev(dev->parent);
> > +
> > + return !!pci_match_id(pmt_telem_early_client_pci_ids, parent);
> > +}
> > +EXPORT_SYMBOL_GPL(intel_pmt_is_early_client_hw);
>
> What is this and why is it in the class driver?

I chose to use the class driver as a central place to store code
common to all of the instances of the class. In this case we have
quirks that are specific to Tiger Lake and so I chose to store the
function to test for the device here.

> > +static ssize_t
> > +intel_pmt_read(struct file *filp, struct kobject *kobj,
> > + struct bin_attribute *attr, char *buf, loff_t off,
> > + size_t count)
> > +{
> > + struct intel_pmt_entry *entry = container_of(attr,
> > + struct intel_pmt_entry,
> > + pmt_bin_attr);
>
> > + if (off < 0)
> > + return -EINVAL;
> Is this real or theoretical?

Not sure. I am not that familiar with the interface. It was something
I copied from read_bmof which is what I based this code on based on an
earlier suggestion.

> > + if (count)
>
> Useless.

I'm assuming that is because memcpy_fromio is assumed to handle this case?

> > + memcpy_fromio(buf, entry->base + off, count);
> > +
> > + return count;
> > +}
>
> ...
>
> > + psize = (PFN_UP(entry->base_addr + entry->size) - pfn) * PAGE_SIZE;
>
> PFN_PHYS(PFN_UP(...)) ?

I'm not sure how that would work. Basically what we are doing here is
determining the size of the mapping based on the number of pages that
will be needed. So we wake the pfn of the start of the region,
subtract that from the pfn for the end of the region and multiply by
the size of a page.

> > +static struct attribute *intel_pmt_attrs[] = {
> > + &dev_attr_guid.attr,
> > + &dev_attr_size.attr,
> > + &dev_attr_offset.attr,
> > + NULL
> > +};
>
> > +
>
> Unneeded blank line.
>
> > +ATTRIBUTE_GROUPS(intel_pmt);
>
> ...
>
> > + /* if size is 0 assume no data buffer, so no file needed */
> > + if (!entry->size)
> > + return 0;
>
> Hmm... But presence of the file is also an information that might be
> useful for user, no?

I'm not sure what you mean? If the size of the region is zero it means
there is no data there. There are cases in future devices where we may
have controls for a telemetry function that is actually streaming the
data elsewhere. That will be the use case for the entry size of 0 and
in that case it doesn't make any sense to have a file as there is no
data to present in it.

> ...
>
> > + entry->base = devm_ioremap_resource(dev, &res);
>
> (1)
>
> > + if (IS_ERR(entry->base)) {
>
> > + dev_err(dev, "Failed to ioremap device region\n");
>
> Duplicates core message.

I'll drop it since it is a redundant.

> > + ret = -EIO;
>
> Why shadowing real error code?

I will just convert it instead. I think there might have been a bug
here in an earlier version where it was testing for NULL instead.

> > + goto fail_ioremap;
> > + }
>
> > + iounmap(entry->base);
>
> This is interesting. How do you avoid double unmap with (1)?

I think I get what you are trying to say. This is redundant since we
used the devm_ioremap_resource it will already be freed when the
driver is detached, correct?

> > +#include <linux/platform_device.h>
> > +#include <linux/xarray.h>
> > +
> > +/* PMT access types */
> > +#define ACCESS_BARID 2
> > +#define ACCESS_LOCAL 3
> > +
> > +/* PMT discovery base address/offset register layout */
> > +#define GET_BIR(v) ((v) & GENMASK(2, 0))
> > +#define GET_ADDRESS(v) ((v) & GENMASK(31, 3))
>
> bits.h

That is already included from a few different sources.

> > +struct intel_pmt_entry {
> > + struct bin_attribute pmt_bin_attr;
> > + struct kobject *kobj;
> > + void __iomem *disc_tabl
> > + void __iomem *base;
> > + unsigned long base_addr;
> > + size_t size;
>
> > + u32 guid;
>
> types.h

Is included through xarray.h.

> > + int devid;
> > +};
>
> > +static inline int
> > +intel_pmt_ioremap_discovery_table(struct intel_pmt_entry *entry,
> > + struct platform_device *pdev, int i)
> > +{
>
> > + entry->disc_table = devm_platform_ioremap_resource(pdev, i);
>
> io.h ?

That one I will move from the class.c file.

> > +
> > + return PTR_ERR_OR_ZERO(entry->disc_table);
>
> err.h

That is included as a part of io.h.

> > +}
>
> The rule of thumb is to include all headers that you have direct users of.
> Then you may optimize by removing those which are guaranteed to be
> included by others, like bits.h always included by bitops.h.

Yeah, from what I can tell the only one I didn't have was io.h and
part of that is because this was something I had moved to the header
file in order to commonize it since it was being used in the other
drivers.