Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
From: Will Deacon
Date: Fri Jun 19 2026 - 11:38:40 EST
On Mon, Jun 15, 2026 at 04:41:04PM +0100, Ryan Roberts wrote:
> On 15/06/2026 15:43, Will Deacon wrote:
> > On Mon, Jun 15, 2026 at 12:21:19PM +0100, Ryan Roberts wrote:
> >>>> + self = smp_processor_id();
> >>>> +
> >>>> + /*
> >>>> + * The load of mm->context.active_cpu must not be reordered before the
> >>>> + * store to the pgtable that necessitated this flush. This ensures that
> >>>> + * if the value read is our cpu id, then no other cpu can have seen the
> >>>> + * old pgtable value and therefore does not need this old value to be
> >>>> + * flushed from its tlb. But we don't want to upgrade the dsb(ishst),
> >>>> + * needed to make the pgtable updates visible to the walker, to a
> >>>> + * dsb(ish) by default. So speculatively load without a barrier and if
> >>>> + * it indicates our cpu id, then upgrade the barrier and re-load.
> >>>> + */
> >>>> + active = READ_ONCE(mm->context.active_cpu);
> >>>> + if (active == self) {
> >>>> + dsb(ish);
> >>>> + active = READ_ONCE(mm->context.active_cpu);
> >>>> + } else {
> >>>> + dsb(ishst);
> >>>> + }
> >>>
> >>> Why can't you just do:
> >>>
> >>> dsb(ishst);
> >>> active = READ_ONCE(mm->context.active_cpu);
> >>>
> >>> ?
> >>
> >> Prior to this optimization, we always issued a dsb(ishst) here. Catalin
> >> suggested the same simplification against the RFC. I believe Linu tried it but
> >> saw regressions; Hopefully Linu can provide the details.
> >
> > I don't follow...
> >
> > The old code always did dsb(ishst). The proposed code here does either
> > dsb(ish) or dsb(ishst). How can that possibly be faster?
>
> Ugh, sorry - I read your suggestion as unconditionally issuing a dsb(ish).
>
> Ignore my previous answer, and now I'll demonstrate my total lack of
> understanding of barriers instead...
>
> As the comment says, "The load of mm->context.active_cpu must not be reordered
> before the store to the pgtable that necessitated this flush". I thought that a
> dsb(ishst) would only provide ordering between stores. Don't we need the
> dsb(ish) to prevent the load from being reordered before the store?
dsb(ishst) orders prior stores -> everything later. That's why it works
today for ordered a PTE write before a TLBI (which isn't a store).
Will