Re: [PATCH v5 00/11] x86,fs/resctrl: Fix long-standing issues
From: Reinette Chatre
Date: Wed Jun 10 2026 - 13:49:02 EST
Hi Everybody,
On 6/9/26 2:02 PM, Reinette Chatre wrote:
> v4: https://lore.kernel.org/lkml/cover.1780456704.git.reinette.chatre@xxxxxxxxx/
> v3: https://lore.kernel.org/lkml/cover.1779476724.git.reinette.chatre@xxxxxxxxx/
> v2: https://lore.kernel.org/lkml/20260515193944.15114-1-tony.luck@xxxxxxxxx/
> v1: https://lore.kernel.org/all/20260508182143.14592-1-tony.luck@xxxxxxxxx/
>
> While reviewing the AET series [1] Sashiko reported a deadlock during mount,
> and a use-after-free when an L3 domain is removed during CPU offline. More issues
> were uncovered as fixes were developed and reviewed. While the goal is to
> fix all issues the races surrounding pseudo-locked regions are not yet
> solved and have been removed from this series (last appearance was in V3 of
> this series).
>
> Applies against tip/master to ensure it considers pending x86/cache changes
> as well as the lockdep_is_cpus_held() stubs available in smp/core.
>
> Changes since V4:
> - Add new fix to prevent out-of-bouds read when SNC is enabled and domain
> with busy RMID goes offline.
> - Add substitute for "is domain going offline" check to workers to avoid
> reading any event counters on soon-to-be-offline domain since its
> cpu_mask is empty and reading an event counter on an SNC enabled system
> depends on knowing a CPU associated with the domain.
>
> Changes since V3:
> - Drop majority of pseudo-locking fixes, only keep the double free/double
> list add fix.
> - Add patch to help document safe RCU list traversal.
> - See individual patches for detailed changes.
>
> [1] https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com
>
> Reinette Chatre (8):
> x86,fs/resctrl: Prevent out-of-bounds access while offlining CPU when
> SNC enabled
> x86,fs/resctrl: Document safe RCU list traversal
> fs/resctrl: Fix deadlock on errors during mount
> fs/resctrl: Prevent use-after-free in rdtgroup_kn_put()
> fs/resctrl: Fix double-add of pseudo-locked region's RMID to free list
> fs/resctrl: Prevent deadlock and use-after-free in info file handlers
> x86/resctrl: Ensure domain fully initialized before placed on RCU list
> fs/resctrl: Fix UAF from worker threads when domains are removed
>
> Tony Luck (3):
> fs/resctrl: Move functions to avoid forward references in subsequent
> fixes
> fs/resctrl: Free mon_data structures on rdt_get_tree() failure
> fs/resctrl: Fix use-after-free during unmount
Addressing Sashiko [2] review feedback here:
[PATCH v5 05/11] fs/resctrl: Fix use-after-free during unmount
Sashiko encountered a "Tool error" during review of this patch. It was able to
complete review of identical patch submitted as part of V4 [3].
Even though Sashiko encountered the "Tool error" it did report a new issue. To help
make progress with the existing fixes I would like to propose that adding new fixes
to this series be stopped. I will start a new series of fixes to address any new
reports.
[PATCH v5 07/11] fs/resctrl: Prevent use-after-free in rdtgroup_kn_put()
[PATCH v5 08/11] fs/resctrl: Fix double-add of pseudo-locked region's RMID to free list
The existing pseudo-locking related issues are known and have been dropped from this series
(last appearance was in V3) while trying to determine how to fix them.
[PATCH v5 11/11] fs/resctrl: Fix UAF from worker threads when domains are removed
Sashiko again (previously reported in V3) reported that this would trigger a lockdep splat
on MPAM. As described in response to previous report [4] this is a false positive since
MPAM does not support the software controller.
Sashiko also reports that "Does this exact ID matching and premature break miss multiple
L3 monitor domains on Sub-NUMA Clustering (SNC) systems?"
This is a false positive since the software controller cannot be used on SNC systems. For
confirmation, see:
commit ac20aa423052 ("x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems")
After considering the latest Sashiko feedback and deciding to stop adding new fixes
to this series, I do think this series can now be considered as "settled".
If there are no concerns with this series self then I would like to proceed for it
to be considered for inclusion.
Thank you very much.
Reinette
[2] https://sashiko.dev/#/patchset/cover.1781029125.git.reinette.chatre%40intel.com
[3] https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=4
[4] https://lore.kernel.org/lkml/9ea1986a-a88f-4224-b530-252e0f5cbfd0@xxxxxxxxx/