Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes

From: Dave Hansen
Date: Mon Oct 07 2024 - 11:59:58 EST


On 10/7/24 01:44, David Hildenbrand wrote:
> On 02.10.24 19:35, Dave Hansen wrote:
>> We were just chatting about this on David Rientjes's MM alignment call.
>
> Unfortunately I was not able to attend this time, my body decided it's a
> good idea to stay in bed for a couple of days.
>
>> I thought I'd try to give a little brain
>>
>> Let's start by thinking about KVM and secondary MMUs.  KVM has a primary
>> mm: the QEMU (or whatever) process mm.  The virtualization (EPT/NPT)
>> tables get entries that effectively mirror the primary mm page tables
>> and constitute a secondary MMU.  If the primary page tables change,
>> mmu_notifiers ensure that the changes get reflected into the
>> virtualization tables and also that the virtualization paging structure
>> caches are flushed.
>>
>> msharefs is doing something very similar.  But, in the msharefs case,
>> the secondary MMUs are actually normal CPU MMUs.  The page tables are
>> normal old page tables and the caches are the normal old TLB.  That's
>> what makes it so confusing: we have lots of infrastructure for dealing
>> with that "stuff" (CPU page tables and TLB), but msharefs has
>> short-circuited the infrastructure and it doesn't work any more.
>
> It's quite different IMHO, to a degree that I believe they are different
> beasts:
>
> Secondary MMUs:
> * "Belongs" to same MM context and the primary MMU (process page tables)

I think you're speaking to the ratio here. For each secondary MMU, I
think you're saying that there's one and only one mm_struct. Is that right?

> * Maintains separate tables/PTEs, in completely separate page table
>   hierarchy

This is the case for KVM and the VMX/SVM MMUs, but it's not generally
true about hardware. IOMMUs can walk x86 page tables and populate the
IOTLB from the _same_ page table hierarchy as the CPU.