Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering

From: Matthew Wilcox
Date: Thu Jul 27 2023 - 11:08:17 EST

Next message: Claudio Imbrenda: "Re: [PATCH 3/3] KVM: s390: pv: Allow AP-instructions for pv guests"
Previous message: Olivier Moysan: "[RFC v2 11/11] ARM: dts: stm32: add dfsdm iio support on stm32mp157c-ev"
In reply to: Nadav Amit: "Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering"
Next in thread: Jann Horn: "Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jul 27, 2023 at 04:39:34PM +0200, Jann Horn wrote:
> Assume that we are holding some kind of lock that ensures that the
> only possible concurrent update to "vma->anon_vma" is that it changes
> from a NULL pointer to a non-NULL pointer (using smp_store_release()).
>
>
> if (READ_ONCE(vma->anon_vma) != NULL) {
> // we now know that vma->anon_vma cannot change anymore
>
> // access the same memory location again with a plain load
> struct anon_vma *a = vma->anon_vma;
>
> // this needs to be address-dependency-ordered against one of
> // the loads from vma->anon_vma
> struct anon_vma *root = a->root;
> }
>
>
> Is this fine? If it is not fine just because the compiler might
> reorder the plain load of vma->anon_vma before the READ_ONCE() load,
> would it be fine after adding a barrier() directly after the
> READ_ONCE()?
>
> I initially suggested using READ_ONCE() for this, and then Linus and
> me tried to reason it out and Linus suggested (if I understood him
> correctly) that you could make the ugly argument that this works
> because loads from the same location will not be reordered by the
> hardware. So on anything other than alpha, we'd still have the
> required address-dependency ordering because that happens for all
> loads, even plain loads, while on alpha, the READ_ONCE() includes a
> memory barrier. But that argument is weirdly reliant on
> architecture-specific implementation details.
>
> The other option is to replace the READ_ONCE() with a
> smp_load_acquire(), at which point it becomes a lot simpler to show
> that the code is correct.

Aren't we straining at gnats here? The context of this is handling a
page fault, and we used to take an entire rwsem for read. I'm having
a hard time caring about "the extra expense" of an unnecessarily broad
barrier.

Cost of an L3 cacheline miss is in the thousands of cycles. Cost of a
barrier is ... tens?

Next message: Claudio Imbrenda: "Re: [PATCH 3/3] KVM: s390: pv: Allow AP-instructions for pv guests"
Previous message: Olivier Moysan: "[RFC v2 11/11] ARM: dts: stm32: add dfsdm iio support on stm32mp157c-ev"
In reply to: Nadav Amit: "Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering"
Next in thread: Jann Horn: "Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]