Re: [EXTERNAL] Re: [PATCH v1 28/31] x86/resctrl: Drop __init/__exit on assorted symbols

From: Dave Martin
Date: Fri Apr 12 2024 - 12:21:51 EST


On Thu, Apr 11, 2024 at 09:21:38PM +0530, Amit Singh Tomar wrote:
> Hi Dave, Reinette,
>
> > On Mon, Apr 08, 2024 at 08:32:36PM -0700, Reinette Chatre wrote:
> > > Hi James,
> > >
> > > On 3/21/2024 9:51 AM, James Morse wrote:
> > > > Because ARM's MPAM controls are probed using MMIO, resctrl can't be
> > > > initialised until enough CPUs are online to have determined the
> > > > system-wide supported num_closid. Arm64 also supports 'late onlined
> > > > secondaries', where only a subset of CPUs are online during boot.
> > > >
> > > > These two combine to mean the MPAM driver may not be able to initialise
> > > > resctrl until user-space has brought 'enough' CPUs online.
> > > >
> > > > To allow MPAM to initialise resctrl after __init text has been free'd,
> > > > remove all the __init markings from resctrl.
> > > >
> > > > The existing __exit markings cause these functions to be removed by the
> > > > linker as it has never been possible to build resctrl as a module. MPAM
> > > > has an error interrupt which causes the driver to reset and disable
> > > > itself. Remove the __exit markings to allow the MPAM driver to tear down
> > > > resctrl when an error occurs.
> > >
> > > Obviously for the reasons you state this code has never been exercised.
> > > Were you able to test this error interrupt flow yet?
> > >
> > > Reinette
> > >
> >
> > I think this will have to wait for James to respond.
> >
> > There is code to tear down resctrl in response to an MPAM error interrupt,
> > but I don't know how it has been exercised so far (if at all).
>
> We are managed to test the MPAM error interrupt (on the platform that
> supports MPAM interrupts on software errors). For instance programming
> more resource control groups (part IDs) than available, and It appears to
> correctly remove the "resctrl" mount point (though mount command still shows
> resctrl on /sys/fs/resctrl type resctrl (rw,relatime)
> ), but

Thanks for trying this out!

Is it possible to unmount resctrl once the system is in this state?

> # mount -t resctrl resctrl /sys/fs/resctrl
> mount: /sys/fs/resctrl: mount point does not exist.

What if you now try to mount resctrl somewhere else, e.g.:

# mount -t resctrl resctrl /mnt

I'm guessing this _should_ fail if you weren't able to unmount resctrl,
since resctrl seems to forbid multiple mount instances.

I'm not sure what the best behaviour is here. Leaving resctrl "half-
mounted" might be a good thing: at this point the system is in a semi-
bad state we want to make sure it can't be remounted. Unregistering the
resctrl filesystem from the fs core feels cleaner if feasible though.

Leaving an impossible unmount operation for init to do during reboot/
shutdown feels unfortunate.

We might have to look at what other filesystems do in this area.

The mount machinery does provide other ways of getting into broken,
impossible situations from userspace, so this doesn't feel like an
entirely new problem.

>
> Additionally, a question regarding this, Is a complete system restart
> necessary to regain the mount?
>
> Thanks
> -Amit

I think James will need to comment on this, but I think that yes, it
is probably appropriate to require a reboot. I think an MPAM error
interrupt should only happen if the software did something wrong, so
it's a bit like hitting a BUG(): we don't promise that everything works
100% properly until the system is restarted. Misbehaviour should be
contained to MPAM though.

Cheers
---Dave