Re: [PATCH v5 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep

From: James Morse
Date: Fri Sep 08 2023 - 11:58:32 EST


Hi Reinette,

On 8/25/23 00:02, Reinette Chatre wrote:
On 8/24/2023 9:56 AM, James Morse wrote:
On 09/08/2023 23:36, Reinette Chatre wrote:
On 7/28/2023 9:42 AM, James Morse wrote:
MPAM's cache occupancy counters can take a little while to settle once
the monitor has been configured. The maximum settling time is described
to the driver via a firmware table. The value could be large enough
that it makes sense to sleep. To avoid exposing this to resctrl, it
should be hidden behind MPAM's resctrl_arch_rmid_read().

resctrl_arch_rmid_read() may be called via IPI meaning it is unable
to sleep. In this case resctrl_arch_rmid_read() should return an error
if it needs to sleep. This will only affect MPAM platforms where
the cache occupancy counter isn't available immediately, nohz_full is
in use, and there are there are no housekeeping CPUs in the necessary
domain.

There are three callers of resctrl_arch_rmid_read():
__mon_event_count() and __check_limbo() are both called from a
non-migrateable context. mon_event_read() invokes __mon_event_count()
using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
rdtgroup_mutex() is held, meaning this cannot race with the resctrl
cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
also adds work to a per-cpu workqueue.

The remaining call is add_rmid_to_limbo() which is called in response
to a user-space syscall that frees an RMID. This opportunistically
reads the LLC occupancy counter on the current domain to see if the
RMID is over the dirty threshold. This has to disable preemption to
avoid reading the wrong domain's value. Disabling pre-emption here
prevents resctrl_arch_rmid_read() from sleeping.

add_rmid_to_limbo() walks each domain, but only reads the counter
on one domain. If the system has more than one domain, the RMID will
always be added to the limbo list. If the RMIDs usage was not over the
threshold, it will be removed from the list when __check_limbo() runs.
Make this the default behaviour. Free RMIDs are always added to the
limbo list for each domain.

The user visible effect of this is that a clean RMID is not available
for re-allocation immediately after 'rmdir()' completes, this behaviour
was never portable as it never happened on a machine with multiple
domains.

Removing this path allows resctrl_arch_rmid_read() to sleep if its called
with interrupts unmasked. Document this is the expected behaviour, and
add a might_sleep() annotation to catch changes that won't work on arm64.


diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 660752406174..f7311102e94c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, u32 rmid, enum resctrl_event_id eventid,
u64 *val);
+/**
+ * resctrl_arch_rmid_read_context_check() - warn about invalid contexts
+ *
+ * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
+ * resctrl_arch_rmid_read() is called with preemption disabled.
+ */
+static inline void resctrl_arch_rmid_read_context_check(void)
+{
+ if (!irqs_disabled())
+ might_sleep();
+}

Apologies but even after rereading the patch as well as your response to
the previous patch version several times I am not able to understand why the
code is looking like above. If, like according to the comment above, a
warning should be generated with preemption disabled, then should it not
just be "might_sleep()" without the "!irqs_disabled()" check?

This would be simpler. But for NOHZ_FULL you wanted to keep the IPI, so the contract with
resctrl_arch_rmid_read() is that if interrupts are unmasked, it can sleep.

Thank you. This appears to be the key. Could you please add this
information to resctrl_arch_rmid_read_context_check()'s description?

That comment now reads:
* resctrl_arch_rmid_read_context_check() - warn about invalid contexts
*
* When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
* resctrl_arch_rmid_read() is called with preemption disabled.
*
* The contract with resctrl_arch_rmid_read() is that if interrupts
* are unmasked, it can sleep. This allows NOHZ_FULL systems to use an
* IPI, (and fail if the call needed to sleep), while most of the time
* the work is scheduled, allowing the call to sleep.



Thanks,

James