On Fri, Mar 26, 2021 at 10:08:09AM +0100, Thomas Hellström (Intel) wrote:
On 3/25/21 7:24 PM, Jason Gunthorpe wrote:So you are saying that while the zap will wait for the TLB flush to
On Thu, Mar 25, 2021 at 07:13:33PM +0100, Thomas Hellström (Intel) wrote:TBH, ptep_get_lockless() also looks a bit fishy. it says
On 3/25/21 6:55 PM, Jason Gunthorpe wrote:Uhhhhh.. That does look questionable, yes. Unless there is some tricky
On Thu, Mar 25, 2021 at 06:51:26PM +0100, Thomas Hellström (Intel) wrote:Hmm, ok, I see a READ_ONCE() in gup_pmd_range(), and then the resulting pmd
On 3/24/21 9:25 PM, Dave Hansen wrote:It loops to get an atomic 64 bit value if the arch can't provide an
On 3/24/21 1:22 PM, Thomas Hellström (Intel) wrote:Hmm,
On x86, we have 64-bit PTEs when running 32-bit kernels if PAE isWe also have not been careful at *all* about how _PAGE_BIT_SOFTW* areOK, I'll follow your advise here. FWIW I grepped for SW1 and it seems
used. It's quite possible we can encode another use even in the
existing bits.
Personally, I'd just try:
#define _PAGE_BIT_SOFTW5 57 /* available for programmer */
used in a selftest, but only for PTEs AFAICT.
Oh, and we don't care about 32-bit much anymore?
enabled. IOW, we can handle the majority of 32-bit CPUs out there.
But, yeah, we don't care about 32-bit. :)
Actually it makes some sense to use SW1, to make it end up in the same dword
as the PSE bit, as from what I can tell, reading of a 64-bit pmd_t on 32-bit
PAE is not atomic, so in theory a huge pmd could be modified while reading
the pmd_t making the dwords inconsistent.... How does that work with fast
gup anyway?
atomic 64 bit load
is dereferenced either in try_grab_compound_head() or __gup_device_huge(),
before the pmd is compared to the value the pointer is currently pointing
to. Couldn't those dereferences be on invalid pointers?
reason why a 64 bit pmd entry on a 32 bit arch either can't exist or
has a stable upper 32 bits..
The pte does it with ptep_get_lockless(), we probably need the same
for the other levels too instead of open coding a READ_ONCE?
Jason
"it will not switch to a completely different present page without a TLB
flush in between".
What if the following happens:
processor 1: Reads lower dword of PTE.
processor 2: Zaps PTE. Gets stuck waiting to do TLB flush
processor 1: Reads upper dword of PTE, which is now zero.
processor 3: Hits a TLB miss, reads an unpopulated PTE and faults in a new
PTE value which happens to be the same as the original one before the zap.
processor 1: Reads the newly faulted in lower dword, compares to the old
one, gives an OK and returns a bogus PTE.
globally finish once it gets started any other processor can still
write to the pte?
I can't think of any serialization that would cause fault to wait for
the zap/TLB flush, especially if the zap comes from the address_space
and doesn't hold the mmap lock.
Seems worth bringing up in a bigger thread, maybe someone else knows?
Jason