Some high-level feedback after studying the code:
1. Try introducing the new dummy primitives ("API") first, and then convert each subsystem individually; especially, maybe convert the whole pagefault handling in a single patch, because it's far from trivial. This will make this series much easier to digest.
I also think pte_alloc_get{,_map,_map_lock}() is ugly, but I can't
Then, have a patch that adds actual logic to the dummy primitives via a config option.
2. Minimize the API.
a) pte_alloc_get{,_map,_map_lock}() is really ugly. Maybe restrict it to pte_alloc_get()
Thanks,
b) pmd_trans_unstable_or_pte_try_get() and friends are really ugly.
Handle it independently for now, even if it implies duplicate runtime checks.
if (pmd_trans_unstable() || !pte_try_get()) ...
We can always optimize later, once we can come up with something cleaner.
3. Merge #6, and #7, after factoring out all changes to other subsystems to use the API
4. Merge #8 into #6. There is a lot of unnecessary code churn back and forth, and IMHO the whole approach might not make sense without RCU due to the additional locking overhead.
Or at least, try to not modify the API you introduced in patch #6 or #7 in #8 again. Converting all call sites back and forth just makes review quite hard.
I am preparing some some cleanups that will make get_locked_pte() and similar a little nicer to handle. I'll send them out this or next week.