Re: [RFC 00/20] TLB batching consolidation and enhancements

From: Nicholas Piggin
Date: Tue Feb 02 2021 - 02:15:21 EST


Excerpts from Peter Zijlstra's message of February 1, 2021 10:44 pm:
> On Sun, Jan 31, 2021 at 07:57:01AM +0000, Nadav Amit wrote:
>> > On Jan 30, 2021, at 7:30 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
>> > I'll go through the patches a bit more closely when they all come
>> > through. Sparc and powerpc of course need the arch lazy mode to get
>> > per-page/pte information for operations that are not freeing pages,
>> > which is what mmu gather is designed for.
>>
>> IIUC you mean any PTE change requires a TLB flush. Even setting up a new PTE
>> where no previous PTE was set, right?

In cases of increasing permissiveness of access, yes it may want to
update the "TLB" (read hash table) to avoid taking hash table faults.

But whatever the reason for the flush, there may have to be more
data carried than just the virtual address range and/or physical
pages.

If you clear out the PTE then you have no guarantee of actually being
able to go back and address the the in-memory or in-hardware translation
structures to update them, depending on what exact scheme is used
(powerpc probably could if all page sizes were the same, but THP or
64k/4k sub pages would throw a spanner in those works).

> These are the HASH architectures. Their hardware doesn't walk the
> page-tables, but it consults a hash-table to resolve page translations.

Yeah, it's very cool in a masochistic way.

I actually don't know if it's worth doing a big rework of it, as much
as I'd like to. Rather than just keep it in place and eventually
dismantling some of the go-fast hooks from core code if we can one day
deprecate it in favour of the much easier radix mode.

The whole thing is like a big steam train, years ago Paul and Ben and
Anton and co got the boiler stoked up and set all the valves just right
so it runs unbelievably well for what it's actually doing but look at it
the wrong way and the whole thing could blow up. (at least that's what
it feels like to me probably because I don't know the code that well).

Sparc could probably do the same, not sure about Xen. I don't suppose
vmware is intending to add any kind of paravirt mode related to this stuff?

Thanks,
Nick