Re: [PATCH v3 11/11] arm64/mm: Batch barriers when updating kernel mappings
From: Catalin Marinas
Date: Tue Apr 15 2025 - 06:55:42 EST
On Mon, Apr 14, 2025 at 07:28:46PM +0100, Ryan Roberts wrote:
> On 14/04/2025 18:38, Catalin Marinas wrote:
> > On Tue, Mar 04, 2025 at 03:04:41PM +0000, Ryan Roberts wrote:
> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index 1898c3069c43..149df945c1ab 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -40,6 +40,55 @@
> >> #include <linux/sched.h>
> >> #include <linux/page_table_check.h>
> >>
> >> +static inline void emit_pte_barriers(void)
> >> +{
> >> + /*
> >> + * These barriers are emitted under certain conditions after a pte entry
> >> + * was modified (see e.g. __set_pte_complete()). The dsb makes the store
> >> + * visible to the table walker. The isb ensures that any previous
> >> + * speculative "invalid translation" marker that is in the CPU's
> >> + * pipeline gets cleared, so that any access to that address after
> >> + * setting the pte to valid won't cause a spurious fault. If the thread
> >> + * gets preempted after storing to the pgtable but before emitting these
> >> + * barriers, __switch_to() emits a dsb which ensure the walker gets to
> >> + * see the store. There is no guarrantee of an isb being issued though.
> >> + * This is safe because it will still get issued (albeit on a
> >> + * potentially different CPU) when the thread starts running again,
> >> + * before any access to the address.
> >> + */
> >> + dsb(ishst);
> >> + isb();
> >> +}
> >> +
> >> +static inline void queue_pte_barriers(void)
> >> +{
> >> + if (test_thread_flag(TIF_LAZY_MMU))
> >> + set_thread_flag(TIF_LAZY_MMU_PENDING);
> >
> > As we can have lots of calls here, it might be slightly cheaper to test
> > TIF_LAZY_MMU_PENDING and avoid setting it unnecessarily.
>
> Yes, good point.
>
> > I haven't checked - does the compiler generate multiple mrs from sp_el0
> > for subsequent test_thread_flag()?
>
> It emits a single mrs but it loads from the pointer twice.
It's not that bad if only do the set_thread_flag() once.
> I think v3 is the version we want?
>
>
> void TEST_queue_pte_barriers_v1(void)
> {
> if (test_thread_flag(TIF_LAZY_MMU))
> set_thread_flag(TIF_LAZY_MMU_PENDING);
> else
> emit_pte_barriers();
> }
>
> void TEST_queue_pte_barriers_v2(void)
> {
> if (test_thread_flag(TIF_LAZY_MMU) &&
> !test_thread_flag(TIF_LAZY_MMU_PENDING))
> set_thread_flag(TIF_LAZY_MMU_PENDING);
> else
> emit_pte_barriers();
> }
>
> void TEST_queue_pte_barriers_v3(void)
> {
> unsigned long flags = read_thread_flags();
>
> if ((flags & (_TIF_LAZY_MMU | _TIF_LAZY_MMU_PENDING)) == _TIF_LAZY_MMU)
> set_thread_flag(TIF_LAZY_MMU_PENDING);
> else
> emit_pte_barriers();
> }
Doesn't v3 emit barriers once _TIF_LAZY_MMU_PENDING has been set? We
need something like:
if (flags & _TIF_LAZY_MMU) {
if (!(flags & _TIF_LAZY_MMU_PENDING))
set_thread_flag(TIF_LAZY_MMU_PENDING);
} else {
emit_pte_barriers();
}
--
Catalin