Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

From: Michal Hocko
Date: Tue Sep 29 2020 - 08:08:10 EST


On Mon 28-09-20 16:31:01, paulmck@xxxxxxxxxx wrote:
[...]
> This commit therefore uses preemptible() to determine whether allocation
> is possible at all for double-argument kvfree_rcu().

This deserves a comment. Because GFP_ATOMIC is possible for many
!preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
others that are a problem. You are taking a conservative approach which
is fine but it would be good to articulate that explicitly.

> If !preemptible(),
> then allocation is not possible, and kvfree_rcu() falls back to using
> the less cache-friendly rcu_head approach. Even when preemptible(),
> the caller might be involved in reclaim, so the GFP_ flags used by
> double-argument kvfree_rcu() must avoid invoking reclaim processing.

Could you be more specific? Is this about being called directly in the
reclaim context and you want to prevent a recursion? If that is the
case, do you really need to special case this in any way? Any memory
reclaim will set PF_MEMALLOC so allocations called from that context
will not perform reclaim. So if you are called from the reclaim directly
then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN.
That should handle both from-the-recalim and outside of reclaim contexts
just fine (assuming you don't allocated from !preemptible()) context.

> Note that single-argument kvfree_rcu() must be invoked in sleepable
> contexts, and that its fallback is the relatively high latency
> synchronize_rcu(). Single-argument kvfree_rcu() therefore uses
> GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
> memory allocator.
[...]
> static inline bool
> -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> + unsigned long *flags, void *ptr, bool can_sleep)
> {
> struct kvfree_rcu_bulk_data *bnode;
> + bool can_alloc_page = preemptible();
> + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;

This is quite confusing IMHO. At least without a further explanation.
can_sleep is not as much about sleeping as it is about the reclaim
recursion AFAIU your changelog, right?

> int idx;
>
> - if (unlikely(!krcp->initialized))
> + *krcp = krc_this_cpu_lock(flags);
> + if (unlikely(!(*krcp)->initialized))
> return false;
>
> - lockdep_assert_held(&krcp->lock);
> idx = !!is_vmalloc_addr(ptr);
>
> /* Check if a new block is required. */
> - if (!krcp->bkvhead[idx] ||
> - krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> - bnode = get_cached_bnode(krcp);
> - if (!bnode) {
> - /*
> - * To keep this path working on raw non-preemptible
> - * sections, prevent the optional entry into the
> - * allocator as it uses sleeping locks. In fact, even
> - * if the caller of kfree_rcu() is preemptible, this
> - * path still is not, as krcp->lock is a raw spinlock.
> - * With additional page pre-allocation in the works,
> - * hitting this return is going to be much less likely.
> - */
> - if (IS_ENABLED(CONFIG_PREEMPT_RT))
> - return false;
> -
> - /*
> - * NOTE: For one argument of kvfree_rcu() we can
> - * drop the lock and get the page in sleepable
> - * context. That would allow to maintain an array
> - * for the CONFIG_PREEMPT_RT as well if no cached
> - * pages are available.
> - */
> - bnode = (struct kvfree_rcu_bulk_data *)
> - __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> + if (!(*krcp)->bkvhead[idx] ||
> + (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> + bnode = get_cached_bnode(*krcp);
> + if (!bnode && can_alloc_page) {
> + krc_this_cpu_unlock(*krcp, *flags);
> + bnode = kmalloc(PAGE_SIZE, gfp);

What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't
using the page allocator directly be better?
--
Michal Hocko
SUSE Labs