Re: [PATCH v4 4/7] fs,x86/resctrl: Add architecture hooks for every mount/unmount

From: Luck, Tony

Date: Thu Apr 09 2026 - 16:35:20 EST

On Mon, Apr 06, 2026 at 02:16:46PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 4/6/26 1:35 PM, Luck, Tony wrote:
> > On Fri, Apr 03, 2026 at 05:52:30PM -0700, Reinette Chatre wrote:
> >> On 3/30/26 2:43 PM, Tony Luck wrote:
> >>> Add hooks for every mount/unmount of the resctrl file system so that
> >>> architecture code can allocate on mount and free on unmount.
> >>
> >> Please use the changelog to describe and motivate all the other things
> >> that this patch does.
> >
> > OK. I will expand.
> >
> >>>
> >>> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> >>> ---
> >>>
> >>> Note this patch disables enumeration of AET monitor events because the
> >>> new mount/unmount hooks do not call intel_aet_get_events() (which is
> >>> not ready for the change from "just on first mount" to "called on
> >>> every mount"). That is resolved in the next patch.
> >>
> >> This could be part of the proper changelog.
> >>
> >> Could patches be re-ordered to support incremental changes?
> >
> > I'll look again because several things have changed since I ordered
> > the series this way. But some bits got overly complicated trying to
> > make AET ready to be called multiple times. If I can't solve elegantly
> > I'll move this into the proper changelog.
>
> Please mark patches as RFC when still working out details. When reviewing it
> helps to know whether something is being submitted for inclusion or not.

I thought I was close enough for minor tweaks. I was wrong. Especially
so after learning from Christoph that symbol_get() and symbol_request()
are quietly being deprecated. I'm going to use Christoph's suggestion
of a registration function in the next version.

I have succesfully reordered these two patches to avoid breaking AET
enumeration for the next series. The minor caveat this time is that
architecture code is first updated to handle multiple mount/unmount
cycles, but file system code still does the DO_ONCE_SLEEPABLE() to only
call on first mount attempt. So the resctrl_arch_unmount() is defined
but not called until the next patch.

It seems nicer to have a "just architecture" patch, followed by a "just
file system" patch. But I could move the definition of resctrl_arch_unmount()
into the file system patch that first uses it.

> ...
>
> >>> @@ -2900,6 +2893,30 @@ static int rdt_get_tree(struct fs_context *fc)
> >>> return ret;
> >>> }
> >>>
> >>> +static int rdt_get_tree_wrapper(struct fs_context *fc)
> >>> +{
> >>> + int ret;
> >>> +
> >>> + mutex_lock(&resctrl_mount_lock);
> >>> +
> >>> + /*
> >>> + * resctrl file system can only be mounted once.
> >>> + */
> >>> + if (resctrl_mounted) {
> >>> + mutex_unlock(&resctrl_mount_lock);
> >>> + return -EBUSY;
> >>> + }
> >>> +
> >>
> >> This does not look right. Here too is resctrl_mounted accessed without rdtgroup_mutex
> >> held. This change implies that resctrl_mounted is now protected by resctrl_mount_lock
> >> but resctrl is not changed to respect this throughout resulting in unsafe access of
> >> resctrl_mounted.
> >>
> >> Does this new resctrl_mount_lock need to be in resctrl fs? It really seems as though the
> >> needed synchronization belongs in the architecture. Could this instead be accomplished
> >> with a private mutex within the AET code?
> >
> > If you dig in lore for the v3 of this patch, you'll see I had the mutex in the
> > AET code. But there were some complications.
> >
> > 1) Need to acquire in intel_aet_pre_mount() and release in intel_aet_mount_result()
> > which is legal, but makes code more complex when call chains need to be compared to
> > check that the mutex is being released correctly.
>
> Why was it needed to hold mutex for so long? I cannot find explanation here or in changelog
> of v3. I did not remember correctly and considered the AET code to be doing the domain
> addition. Even so, I do think a mutex internal to the arch code can be used to manage
> the synchronization. Could you please elaborate why this cannot be done?

I tried to move the locks into architecture code. But main problem is still
handling when a user tries to mount an already mounted resctrl file system
and gets -EBUSY.

In that case file system calls resctrl_arch_pre_mount() with the file system
mounted. You suggested that the AET code could detect and ignore a repeat
enumeration by noting that the event_group "(*peg)->pfg" is non-NULL, set by
the original enumeration. But that fails in this scenario:

# rmmod pmt_telemetry
# mount -t resctrl resctrl /sys/fs/resctrl
... succeeds, but without AET present
# modprobe pmt_telemetry
# mount -t resctrl resctrl /sys/fs/resctrl
... enumeration success, but now calls resctrl_enable_mon_event()
... with the file system mounted

I think the bast solution for this is to change definition of resctrl_arch_pre_mount()
from "called on every mount attempt" to "called only when resctrl is NOT mounted".
This is because architecture code cannot tell whether the file system is mounted.

>
> > 2) The "only mounted once" case meant extra state (AET_PRESENT, which you note
> > in next patch may be redundant) because intel_aet_pre_mount() is called, but
> > needs to do nothing.
>
> Right, I do not see need for extra state. In fact, since it is not clear to me that
> PMT enumeration will be complete when intel_pmt_get_regions_by_feature() is called it
> seemed worthwhile to only rely on event_group::pfg - if PMT enumeration was not complete
> during mount N it may be complete on mount N+1? This creates a poor user interface
> though since user would need an alternate way to know if AET is supported and then
> a "remount until it works" approach.

The race remains, and is lost if resctrl is auto-mounted at boot from /etc/fstab.

The user can tell if AET is supported with:

$ grep ^ /sys/class/intel_pmt/*/guid

and checking if any of the RMID based event guids are present on the system.

Delta T for the race is small enough that delaying the mount to some other
startup script should be sufficient. Users are likely to have such a script
to create the CTRL_MON directories and configure schemata for their workload.
So annoying, but easily solved.

> >
> > Adding resctrl_mount_lock to the file system code made things simpler. The
>
> Adding complications to resctrl fs to make things simpler for x86?

I believe it is necessary, since architecture cannot tell if the file system
is mounted.

> > pre-mount code can't be called with rdtgroup_mutex held because it needs to
> > build the domains. That needs cpus_read_lock() + mutex_lock(&domain_list_lock);
>
> ack. Can an arch-specific mutex be used instead?

See above.

> > I need to add more comments on locking. resctrl_mounted is only modified when both
> > resctrl_mount_lock AND rdtgroup_mutex are held. I believe that makes it safe to
> > read the value of resctrl_mounted with just rdtgroup_mutex held.
>
> ...but not to read it with only resctrl_mount_lock held as in snippet above.

Holding either of resctrl_mount_lock or rdtgroup_mutex makes it safe to
read the value of resctrl_mounted as it can only be modified when both
mutexes are held.

> Reinette

-Tony