Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

From: David Hildenbrand
Date: Tue Apr 04 2023 - 10:05:27 EST


On 04.04.23 15:42, Yair Podemsky wrote:
The tlb_remove_table_smp_sync IPI is used to ensure the outdated tlb page
is not currently being accessed and can be cleared.
This occurs once all CPUs have left the lockless gup code section.
If they reenter the page table walk, the pointers will be to the new
pages.
Therefore the IPI is only needed for CPUs in kernel mode.
By preventing the IPI from being sent to CPUs not in kernel mode,
Latencies are reduced.

Race conditions considerations:
The context state check is vulnerable to race conditions between the
moment the context state is read to when the IPI is sent (or not).

Here are these scenarios.
case 1:
CPU-A CPU-B

state == CONTEXT_KERNEL
int state = atomic_read(&ct->state);
Kernel-exit:
state == CONTEXT_USER
if (state & CT_STATE_MASK == CONTEXT_KERNEL)

In this case, the IPI will be sent to CPU-B despite it is no longer in
the kernel. The consequence of which would be an unnecessary IPI being
handled by CPU-B, causing a reduction in latency.
This would have been the case every time without this patch.

case 2:
CPU-A CPU-B

modify pagetables
tlb_flush (memory barrier)
state == CONTEXT_USER
int state = atomic_read(&ct->state);
Kernel-enter:
state == CONTEXT_KERNEL
READ(pagetable values)
if (state & CT_STATE_MASK == CONTEXT_USER)

In this case, the IPI will not be sent to CPU-B despite it returning to
the kernel and even reading the pagetable.
However since this CPU-B has entered the pagetable after the
modification it is reading the new, safe values.

The only case when this IPI is truly necessary is when CPU-B has entered
the lockless gup code section before the pagetable modifications and
has yet to exit them, in which case it is still in the kernel.

Signed-off-by: Yair Podemsky <ypodemsk@xxxxxxxxxx>
---
mm/mmu_gather.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 5ea9be6fb87c..731d955e152d 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -9,6 +9,7 @@
#include <linux/smp.h>
#include <linux/swap.h>
#include <linux/rmap.h>
+#include <linux/context_tracking_state.h>
#include <asm/pgalloc.h>
#include <asm/tlb.h>
@@ -191,6 +192,20 @@ static void tlb_remove_table_smp_sync(void *arg)
/* Simply deliver the interrupt */
}
+
+#ifdef CONFIG_CONTEXT_TRACKING
+static bool cpu_in_kernel(int cpu, void *info)
+{
+ struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
+ int state = atomic_read(&ct->state);
+ /* will return true only for cpus in kernel space */
+ return state & CT_STATE_MASK == CONTEXT_KERNEL;
+}
+#define CONTEXT_PREDICATE cpu_in_kernel
+#else
+#define CONTEXT_PREDICATE NULL
+#endif /* CONFIG_CONTEXT_TRACKING */
+
#ifdef CONFIG_ARCH_HAS_CPUMASK_BITS
#define REMOVE_TABLE_IPI_MASK mm_cpumask(mm)
#else
@@ -206,8 +221,8 @@ void tlb_remove_table_sync_one(struct mm_struct *mm)
* It is however sufficient for software page-table walkers that rely on
* IRQ disabling.
*/
- on_each_cpu_mask(REMOVE_TABLE_IPI_MASK, tlb_remove_table_smp_sync,
- NULL, true);
+ on_each_cpu_cond_mask(CONTEXT_PREDICATE, tlb_remove_table_smp_sync,
+ NULL, true, REMOVE_TABLE_IPI_MASK);
}
static void tlb_remove_table_rcu(struct rcu_head *head)


Maybe a bit cleaner by avoiding CONTEXT_PREDICATE, still not completely nice
(an empty dummy function "cpu_maybe_in_kernel" might be cleanest but would
be slightly slower for !CONFIG_CONTEXT_TRACKING):

#ifdef CONFIG_CONTEXT_TRACKING
static bool cpu_in_kernel(int cpu, void *info)
{
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
int state = atomic_read(&ct->state);
/* will return true only for cpus in kernel space */
return state & CT_STATE_MASK == CONTEXT_KERNEL;
}
#endif /* CONFIG_CONTEXT_TRACKING */


...
#ifdef CONFIG_CONTEXT_TRACKING
on_each_cpu_mask(REMOVE_TABLE_IPI_MASK, tlb_remove_table_smp_sync,
NULL, true);
#else /* CONFIG_CONTEXT_TRACKING */
on_each_cpu_cond_mask(cpu_in_kernel, tlb_remove_table_smp_sync,
NULL, true, REMOVE_TABLE_IPI_MASK);
#endif /* CONFIG_CONTEXT_TRACKING */


--
Thanks,

David / dhildenb