Re: [PATCH v2 0/9] Free user PTE page table pages

From: Qi Zheng
Date: Wed Sep 15 2021 - 10:52:51 EST




On 9/1/21 8:32 PM, David Hildenbrand wrote:

Hi David,




Some high-level feedback after studying the code:

1. Try introducing the new dummy primitives ("API") first, and then convert each subsystem individually; especially, maybe convert the whole pagefault handling in a single patch, because it's far from trivial. This will make this series much easier to digest.

I am going to split this patch series as follows:

1. Introduce the new dummy APIs, which is an empty implementation.
But I will explain its semantics.
2. Merge #6, #7 and #8, and call these dummy APIs in any necessary
location, and split some special cases into single patches, such as
pagefault and gup, etc. So that we can explain in more detail the
concurrency in these cases. For example, we don't need to hold any
pte_refcount in the fast path in gup on the x86_64 platform. Because
the PTE page can't be freed after the local CPU interrupt is closed
in the fast path in gup.
3. Introduce CONFIG_FREE_USER_PTE and implement these empty dummy APIs.
4. Add a description document.

And I try to add a function that combines pte_offset_map() and
pte_try_get(). Maybe the func name is pte_try_map() recommended by
Jason, or keep the pte_offset_map() unchanged?


Then, have a patch that adds actual logic to the dummy primitives via a config option.

2. Minimize the API.

a) pte_alloc_get{,_map,_map_lock}() is really ugly. Maybe restrict it to pte_alloc_get()
I also think pte_alloc_get{,_map,_map_lock}() is ugly, but I can't
figure out a more suitable name. Maybe we can keep the
pte_alloc{,_map,_map_lock}() without any modification? But I am
worried that the caller will forget to call the paired pte_put().


b) pmd_trans_unstable_or_pte_try_get() and friends are really ugly.

Handle it independently for now, even if it implies duplicate runtime checks.

if (pmd_trans_unstable() || !pte_try_get()) ...

We can always optimize later, once we can come up with something cleaner.

3. Merge #6, and #7, after factoring out all changes to other subsystems to use the API

4. Merge #8 into #6. There is a lot of unnecessary code churn back and forth, and IMHO the whole approach might not make sense without RCU due to the additional locking overhead.

Or at least, try to not modify the API you introduced in patch #6 or #7 in #8 again. Converting all call sites back and forth just makes review quite hard.


I am preparing some some cleanups that will make get_locked_pte() and similar a little nicer to handle. I'll send them out this or next week.

Thanks,
Qi