Re: [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl

From: Reinette Chatre

Date: Tue Jun 23 2026 - 11:52:10 EST


Hi Tony,

On 6/22/26 4:00 PM, Luck, Tony wrote:
> On Mon, Jun 22, 2026 at 08:46:31AM -0700, Reinette Chatre wrote:
>> On 6/18/26 2:15 PM, Luck, Tony wrote:
>>> On Mon, Jun 08, 2026 at 04:22:27PM -0700, Reinette Chatre wrote:
>>>> On 6/1/26 12:56 PM, Tony Luck wrote:
>>>>> INTEL_PMT_TELEMETRY is a loadable module, but resctrl is built-in and cannot
>>>>> call PMT functions directly. Register the telemetry enumeration function
>>>>> pointers at pmt_telemetry module init, and unregister them at module exit.
>>>>
>>>> To ensure intel_pmt_get_regions_by_feature() has access to complete data, could
>>>> it be more accurate to register at the end of PMT's .probe() and similarly
>>>> unregister during .remove()?
>>>
>>> I agreed with this. But on further reflection I'm going to dissent.
>>>
>>> There are multiple devices (at least one per socket). So .probe() is
>>> called for each. Registering with resctrl at the end of .probe() sets
>>
>> Thanks for highlighting this.
>>
>>> up for a race with a mount of the resctrl file system:
>>>
>>> modprobe mount
>>> .init()
>>> auxiliary_driver_register()
>>> .probe() for socket 0 device rdt_get_tree()
>>> intel_aet_register_enumeration() resctrl_arch_pre_mount()
>>> mutex_lock(aet_register_lock) intel_aet_pre_mount()
>>> get_feature = get; mutex_lock(aet_register_lock)
>>> ... ... blocks ...
>>> mutex_unlock(aet_register_lock)
>>> ... runs ...
>>> .probe() for socket 1 device Does enumeration with socket 0 complete
>>> but races with socket 1 .probe()
>>
>> Could you please elaborate the details being the "race with socket 1"? Wouldn't
>> moving registration to init() experience the same? That is, if registered during init() then
>> at the time of resctrl mount socket 0's probe could be complete but not socket 1's? The
>> move to .init() has additional scenario where resctrl mounts when neither socket's
>> probe has completed.
>
> See '*' paragraph below. At end of pmt_telemetry .init() all probes have run and completed.
>
>> Are you referring to how user needs to remount resctrl to obtain all of AET that
>> the doc patch refers to or is the race more serious?
>>
>>> I may keep the unregister call in the .remove() because as soon as the first
>>> device goes away, resctrl can't usefully run. So it seems a good idea to
>>> handle that right away.
>>
>> The "resctrl can't usefully run" is not clear to me since resctrl mount seems to be
>> ok to let mount succeed without all devices probed (per above). So it is ok to mount
>> resctrl with partial telemetry enumeration but once all is enumerated this will not
>> be supported?
>
> No. Mount shouldn't run unless all devices have been probed.
>
>>>
>>> I will provide details on the reason for the asymmetric .init() vs. .remove()
>>> in the commit comment (and in code).
>>>
>>> Ok?
>>
>> I seem to be missing a few details to understand this solution.
>
> For AET telemetry to be useful all aggregators must be enumerated.
> Running with some subset would only provide data for some subset of
> the cores on a system.
>
> So now I'm trending back to registering in module .init() after
> all probes have run, and unregistering in .exit() before doing
> anything else.
>
> Some experimentation has shown that the asynchronous part enumeration
> is just the intel_vsec driver kicking of auxilliary device probes.
>
> * Looking just at the pmt_telemetry .init() routine, the .probe()
> * calls for each device are run sequentially and synchronously. So
> * when .init() returns all of the AET enumeration is complete.

If I understand correctly this is because pmt_telemetry relies on
device_driver::probe_type initialized with the default (PROBE_DEFAULT_STRATEGY)
that can be overruled by a user space wanting to "boot faster".
Should pmt_telemetry's probe_type be set to PROBE_FORCE_SYNCHRONOUS to support
this new requirement from resctrl?

>
> Today I dug into the problem that mount initiated automatically
> by systemd from an entry in /etc/fstab occurs before pmt_telemetry
> is loaded. I asked AI (Gemini) if there was a way to let systemd
> know it must wait for enumeration to complete before invoking
> mount(2). There are MANY options to do this. After cycling through
> several that either didn't work, or seemed overly complex or fragile,
> I've settled on this one. I can add to the resctrl.rst documentation.
>
> It uses a systemd service to mount resctrl triggered by a udev rule on the
> load of the pmt_telemetry module.
>
> The udev rule:
> $ cat /etc/udev/rules.d/99-rmid-telemetry.rules
> SUBSYSTEM=="module", KERNEL=="intel_pmt", ACTION=="add", RUN+="/usr/bin/systemctl start mount-resctrl.service"
>
> The systemd service descriptor:
> $ cat /etc/systemd/system/mount-resctrl.service
> [Unit]
> Description=Mount resctrl pseudo-filesystem after Intel PMT loads
> After=local-fs.target
>
> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/usr/bin/mount -t resctrl resctrl /sys/fs/resctrl
>
> [Install]
> WantedBy=multi-user.target
>
This looks very helpful. Thank you very much for investigating this.

Reinette