Also, I don't really understand how this scheme works with
get_user_pages_fast.
With the RCU change it in #8 it should work just fine, because RCU
synchronize has to wait either until all other CPUs have left the RCU read
section, or re-enabled interrupts.
So at this point in the series fast gup is broken, that does mean the
series presentation really needs to be reworked. The better
presentation is to add the API changes, with a
no-functional-difference implementation, push the new API in well
split patches to all the consumption sites, then change the API to
have the new semantics.
RCU and refcount to free the page levels seems like a reasonable
approach, but I have to say I haven't thought it through fully - are
all the contexts that have the pte deref safe to do call_rcu?
Jason