Re: [RFC PATCH 0/4] Improve RT performance & latency w/ local_lock_t()

From: Leonardo Bras Soares Passos
Date: Tue Aug 01 2023 - 15:06:20 EST


CC: Peter Xu

On Sat, Jul 29, 2023 at 5:38 AM Leonardo Bras <leobras@xxxxxxxxxx> wrote:
>
> The problem:
> We have a few scenarios in mm that make use of local_locks() +
> schedule_work_on() due to reduced overhead on the most common local
> references. This scenario is not ideal for RT workloads though: even on
> isolated cpus, those tasks will schedule-out the sensitive RT workloads to
> perform those jobs, and usually cause missing deadlines with this.
>
> The idea:
> In PREEMPT_RT, local_locks() end up becoming spinlocks(), so there should
> be no harm in just getting another cpu's spinlock to perform the work
> on the per-cpu structure: the cacheline invalidation will already happen
> due to the usage by schedule_work_on(), and on local functions the locking
> already happens anyway.
>
> This will avoid schedule_work_on(), and thus avoid scheduling-out an
> RT workload. Given the usually brief nature of those scheduled tasks, their
> execution end up being faster than doing their scheduling.
>
> ======
>
> While I really belive the solution, there are problems with this patchset,
> which I need your suggestions for improvement:
>
> 1) Naming is horrible: I could not think on a good name for the new lock
> functions, so I lazely named it local_lock_n().
> The naming should have been making clear that we are in fact dealing
> with a local_lock but it can in some scenarios be addressing another cpu's
> local_lock, and thus we need the extra cpu parameter.
>
> Dealing with this local & remote duality, all I thought was:
> mostly_local_lock(), (or local_mostly_lock())
> local_maybe_remote_lock() <- LOL
> remote_local_lock()
> per_cpu_local_lock()
> local_lock_cpu()
>
> Please help !
>
>
> 2) Maybe naming is this hard because the interface is not the best.
> My idea was to create a simple interface to easily replace functions
> already in use, but maybe there is something better I could not see.
>
> Please contribute!
>
> 3) I am lazely setting work->data without atomic operations, which may
> be bad in some scenario. If so, even thought it can be costly, since it
> happens outside of the hotpath (local per-cpu areas) it should be no
> problem adding the atomic operation for non-RT kernels.
>
> For RT kernels, since the whole operation hapens locally, there should be
> no need of using the atomic set.
>
> Please let me know of any idea, or suggestion that can improve this RFC.
>
> Thanks a lot!
> Leo
>
> Leonardo Bras (4):
> Introducing local_lock_n() and local queue & flush
> swap: apply new local_schedule_work_on() interface
> memcontrol: apply new local_schedule_work_on() interface
> slub: apply new local_schedule_work_on() interface
>
> include/linux/local_lock.h | 18 ++++++++++
> include/linux/local_lock_internal.h | 52 +++++++++++++++++++++++++++++
> mm/memcontrol.c | 17 ++++++----
> mm/slub.c | 17 ++++++----
> mm/swap.c | 18 +++++-----
> 5 files changed, 100 insertions(+), 22 deletions(-)
>
> --
> 2.41.0
>