Re: [PATCH v3 6/9] fs/resctrl: Fix pseudo-locking lifetime handling

From: Reinette Chatre

Date: Thu May 28 2026 - 12:36:56 EST


Hi Ben,

On 5/28/26 3:56 AM, Ben Horgan wrote:
> On 5/22/26 20:15, Reinette Chatre wrote:

...

>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 48af75b9dc85..e7e415ee7766 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -234,6 +234,15 @@ struct rdtgroup {
>>
>> /* rdtgroup.flags */
>> #define RDT_DELETED 1
>> +/*
>> + * RDT_DELETED_PLR is set when the pseudo-locked group's infrastructure
>> + * (its associated device, debugfs files, etc.) has been deleted via
>> + * rdtgroup_pseudo_lock_remove(). This can be done while there are
>> + * references to the pseudo-locked region since the pseudo-locked region
>> + * self is freed separately via pseudo_lock_free() after there are no more
>> + * references.
>> + */
>> +#define RDT_DELETED_PLR 2
>
> I haven't had a proper look at this patch but there are a few places where
> 'flags = RDT_DELETED' is used rather than 'flags |= RDT_DELETED'. Are these all
> fine?

These "deleted" flags are not independent. The pseudo-locking infrastructure can/should
only be deleted if the resource group has already been deleted or is about to be
deleted in the same flow. This relationship is clear when looking at the RDT_DELETED_PLR
assignment in pseudo_lock_dev_release() and rdtgroup_kn_put() where RDT_DELETED_PLR
setting depends on RDT_DELETED already being set. There is one place in rmdir_all_sub()
where the order is swapped with pseudo-locking infrastructure is torn down first,
thus setting RDT_DELETED_PLR _before_ RDT_DELETED. This is why rmdir_all_sub() uses
flags |= RDT_DELETED instead of assignment.

Even so, the issue [1] reported by Sashiko is real and how to fix that one is
not obvious to me at this time. These issues uncovered so far are races that can be triggered
by user space stress usage of the pseudo-locking files that is only possible on some
very specific ten-year old hardware. Such usage is contrary to the pseudo-locking usage model
so I am currently considering pulling this fix from the series. To add to this Sashiko reported
another [2] issue in this area while reviewing the resubmission aimed to get review feedback
on the last three patches.

Reinette

[1] https://sashiko.dev/#/patchset/cover.1779476724.git.reinette.chatre%40intel.com?part=6
[2] https://sashiko.dev/#/patchset/cover.1779834897.git.reinette.chatre%40intel.com?part=2