Re: [PATCH V2] x86/intel_rdt: Ensure usage of CPUs are locked while needed

From: Reinette Chatre
Date: Tue Dec 11 2018 - 13:34:04 EST

Hi Boris,

On 12/11/2018 4:34 AM, Borislav Petkov wrote:
> On Mon, Dec 10, 2018 at 01:21:54PM -0800, Reinette Chatre wrote:
>> The user triggers the creation of a pseudo-locked region when writing
>> the requested schemata to the schemata resctrl file. The pseudo-locking
>> of a region is required to be done on a CPU that is associated with the
>> cache on which the pseudo-locked region will reside. In order to run the
>> locking code on a specific CPU the needed CPU has to be selected and
>> ensured to remain online during the entire locking sequence.
>> At this time the cpu_hotplug_lock is not taken during the pseudo-lock
>> region creation and it is thus possible for a CPU to be selected to run
>> the pseudo-locking code and then that CPU to go offline before the
>> thread is able to run on it.
>> Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
>> which code has to run needs to be controlled. Since the cpu_hotplug_lock
>> is always taken before rdtgroup_mutex the lock order is maintained.
>> Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
>> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> ---
>> V2:
>> - Rebase against tip/x86/urgent
>> - Modify subject from x86/resctrl to x86/intel_rdt to match subject used
>> before the code reorganization.
>> arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 4 ++++
>> 1 file changed, 4 insertions(+)
> I took it but changed the subject to the more straight-forward:

Thank you very much.

> "x86/intel_rdt: Disable CPU hotplug while modifying schemata"

I am not sure that this is an issue when updating a schemata in the
general case. In the case when just CAT schemata (without
pseudo-locking) is updated then the cpu mask associated with the cache
instance is indeed used to determine which CPUs should have their
registers changed but only the current CPU is not checked for being
online, for the other CPUs smp_call_function_many() is used that
includes an online check.

> Now, your second patch:
> Subject: [PATCH V2] x86/resctrl: Fix rdt_find_domain() return value and checks
> Message-Id: <b88cd4ff6a75995bf8db9b0ea546908fe50f69f3.1544479852.git.reinette.chatre@xxxxxxxxx>
> has the new file paths, has Fixes: tags but no CC:stable.
> I'm guessing it needs to go in the next merge window with the rest of
> the new stuff and not now, with the urgent pile?
> I'm thinking that because it is not really a bug now, as the negative ID
> happens to work.

I had the same question in V1's notes to the maintainer :)

My initial concern was the lack of IS_ERR checking. Understanding the
flow better now it seems to me that this is indeed not a bug now. The
reasoning is that an ERR_PTR is only returned when a negative id is
provided in the parameters to rdt_find_domain(). There are currently
only two places where a negative id could be provided to
rdt_find_domain(), domain_add_cpu() and domain_remove_cpu(), and both
locations test the return value using IS_ERR.