Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

From: David Hildenbrand
Date: Thu Apr 06 2023 - 11:52:52 EST


On 06.04.23 17:02, Peter Zijlstra wrote:
On Thu, Apr 06, 2023 at 04:04:23PM +0200, Peter Zijlstra wrote:
On Thu, Apr 06, 2023 at 03:29:28PM +0200, Peter Zijlstra wrote:
On Thu, Apr 06, 2023 at 09:38:50AM -0300, Marcelo Tosatti wrote:

To actually hit this path you're doing something really dodgy.

Apparently khugepaged is using the same infrastructure:

$ grep tlb_remove_table khugepaged.c
tlb_remove_table_sync_one();
tlb_remove_table_sync_one();

So just enabling khugepaged will hit that path.

Urgh, WTF..

Let me go read that stuff :/

At the very least the one on collapse_and_free_pmd() could easily become
a call_rcu() based free.

I'm not sure I'm following what collapse_huge_page() does just yet.

DavidH, what do you thikn about reviving Jann's patches here:

https://bugs.chromium.org/p/project-zero/issues/detail?id=2365#c1

Those are far more invasive, but afaict they seem to do the right thing.


I recall seeing those while discussed on security@xxxxxxxxxx. What we currently have was (IMHO for good reasons) deemed better to fix the issue, especially when caring about backports and getting it right.

The alternative that was discussed in that context IIRC was to simply allocate a fresh page table, place the fresh page table into the list instead, and simply free the old page table (then using common machinery).

TBH, I'd wish (and recently raised) that we could just stop wasting memory on page tables for THPs that are maybe never going to get PTE-mapped ... and eventually just allocate on demand (with some caching?) and handle the places where we're OOM and cannot PTE-map a THP in some descend way.

... instead of trying to figure out how to deal with these page tables we cannot free but have to special-case simply because of GUP-fast.

--
Thanks,

David / dhildenb