Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()

From: Nadav Amit
Date: Tue Oct 26 2021 - 16:07:44 EST

Next message: Matthew Wilcox: "Re: [RFC 0/8] Hardening page _refcount"
Previous message: Daniel Bristot de Oliveira: "Re: [PATCH V5 08/20] rtla: Helper functions for rtla"
In reply to: Dave Hansen: "Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()"
Next in thread: Dave Hansen: "Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Oct 26, 2021, at 12:40 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 10/26/21 12:06 PM, Nadav Amit wrote:
>>
>> To make it very clear - consider the following scenario, in which
>> a volatile pointer p is mapped using a certain PTE, which is RW
>> (i.e., *p is writable):
>>
>> CPU0 CPU1
>> ---- ----
>> x = *p
>> [ PTE cached in TLB;
>> PTE is not dirty ]
>> clear_pte(PTE)
>> *p = x
>> [ needs to set dirty ]
>>
>> Note that there is no TLB flush in this scenario. The question
>> is whether the write access to *p would succeed, setting the
>> dirty bit on the clear, non-present entry.
>>
>> I was under the impression that the hardware AD-assist would
>> recheck the PTE atomically as it sets the dirty bit. But, as I
>> said, I am not sure anymore whether this is defined architecturally
>> (or at least would work in practice on all CPUs modulo the
>> Knights Landing thingy).
>
> Practically, at "x=*p", he thing that gets cached in the TLB will
> Dirty=0. At the "*p=x", the CPU will decide it needs to do a write,
> find the Dirty=0 entry and will entirely discard it. In other words, it
> *acts* roughly like this:
>
> x = *p
> INVLPG(p)
> *p = x;
>
> Where the INVLPG() and the "*p=x" are atomic. So, there's no
> _practical_ problem with your scenario. This specific behavior isn't
> architectural as far as I know, though.
>
> Although it's pretty much just academic, as for the architecture, are
> you getting hung up on the difference between the description of "Accessed":
>
> Whenever the processor uses a paging-structure entry as part of
> linear-address translation, it sets the accessed flag in that
> entry
>
> and "Dirty:"
>
> Whenever there is a write to a linear address, the processor
> sets the dirty flag (if it is not already set) in the paging-
> structure entry...
>
> Accessed says "as part of linear-address translation", which means that
> the address must have a translation. But, the "Dirty" section doesn't
> say that. It talks about "a write to a linear address" but not whether
> there is a linear address *translation* involved.
>
> If that's it, we could probably add a bit like:
>
> In addition to setting the accessed flag, whenever there is a
> write...
>
> before the dirty rules in the SDM.
>
> Or am I being dense and continuing to miss your point? :)

I think this time you got my question right.

I was thrown off by the SDM comment on RW permissions vs dirty that I
mentioned before:

"If software on one logical processor writes to a page while software on
another logical processor concurrently clears the R/W flag in the
paging-structure entry that maps the page, execution on some processors may
result in the entry’s dirty flag being set (due to the write on the first
logical processor) and the entry’s R/W flag being clear (due to the update
to the entry on the second logical processor).”

I did not pay enough attention to these small differences that you mentioned
between access and dirty this time (although I did notice them before).

I do not think that the change that you offered to the SDM really clarifies
the situation. Setting the access flag is done as part of caching the PTE in
the TLB. The SDM change you propose does not clarify the atomicity of the
permission/PTE-validity check and dirty-bit setting or the fact the PTE is
invalidated if the dirty-bit needs to be set and is cached as clear [I do not
presume you would want the latter in the SDM, since it is an implementation
detail.]

I just wonder how come the R/W-clearing and the P-clearing cause concurrent
dirty bit setting to behave differently. I am not a hardware guy, but I would
imagine they would be the same...

Next message: Matthew Wilcox: "Re: [RFC 0/8] Hardening page _refcount"
Previous message: Daniel Bristot de Oliveira: "Re: [PATCH V5 08/20] rtla: Helper functions for rtla"
In reply to: Dave Hansen: "Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()"
Next in thread: Dave Hansen: "Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]