Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes

From: Anthony Yznaga
Date: Wed Oct 02 2024 - 20:25:38 EST

Next message: Boqun Feng: "Re: [RFC PATCH 3/4] hp: Implement Hazard Pointers"
Previous message: Yoshihiro Furudera: "[PATCH] perf list: update option desc in man page"
In reply to: Dave Hansen: "Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes"
Next in thread: David Hildenbrand: "Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/2/24 4:11 PM, Dave Hansen wrote:

About TLB flushing...

The quick and dirty thing to do is just flush_tlb_all() after you remove
the PTE from the host mm. That will surely work everywhere and it's as
dirt simple as you get. Honestly, it might even be cheaper than the
alternative.

I think this a good place to start from. My concern is that unnecessary flushes will potentially impact unrelated loads. Performance testing as things progress can help determine if a more involved approach is needed.

Also, I don't think PCIDs actually complicate the problem at all. We
basically do remote mm TLB flushes using two mechanisms:

1. If the mm is loaded, use INVLPG and friends to zap the TLB
2. Bump mm->context.tlb_gen so that the next time it _gets_
loaded, the TLB is flushed.

flush_tlb_func() really only cares about #1 since if the mm isn't
loaded, it'll get flushed anyway at the next context switch.

The alternatives I can think of:

Make flush_tlb_mm_range(host_mm) work somehow. You'd need to somehow
keep mm_cpumask(host_mm) up to date and also make do something to
flush_tlb_func() to tell it that 'loaded_mm' isn't relevant and it
should flush regardless.

The other way is to use the msharefs's inode ->i_mmap to find all the
VMAs mapping the file, and find all *their* mm's:

for each vma in inode->i_mmap
mm = vma->vm_mm
flush_tlb_mm_range(<vma range here>)

But that might be even worse than flush_tlb_all() because it might end
up sending more than one IPI per CPU.

You can fix _that_ by keeping a single cpumask that you build up:

mask = 0
for each vma in inode->i_mmap
mm = vma->vm_mm
mask |= mm_cpumask(mm)

flush_tlb_multi(mask, info);

Unfortunately, 'info->mm' needs to be more than one mm, so you probably
still need a new flush_tlb_func() flush type to tell it to ignore
'info->mm' and flush anyway.

What about something like arch_tlbbatch_flush() which sets info->mm to NULL and uses a batch cpumask? That seems perfect though there would be a bit more work needed to ensure things work on other architectures. flush_tlb_all() it is then, for now. :-)

After all that, I kinda like flush_tlb_all(). ;)

Next message: Boqun Feng: "Re: [RFC PATCH 3/4] hp: Implement Hazard Pointers"
Previous message: Yoshihiro Furudera: "[PATCH] perf list: update option desc in man page"
In reply to: Dave Hansen: "Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes"
Next in thread: David Hildenbrand: "Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]