Re: [PATCH] percpu_ida: Use _irqsave() instead of local_irq_save() + spin_lock

From: Jens Axboe
Date: Mon May 07 2018 - 17:34:48 EST


On 5/7/18 7:47 AM, Matthew Wilcox wrote:
> On Sat, May 05, 2018 at 08:52:02AM -0700, Matthew Wilcox wrote:
>> init and destroy seem to map to sbitmap_queue_init_node and
>> sbitmap_queue_free. percpu_ida_free maps to sbitmap_queue_clear.
>
> Hmm.
>
> void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr,
> unsigned int cpu)
> {
> sbitmap_clear_bit_unlock(&sbq->sb, nr);
> sbq_wake_up(sbq);
> if (likely(!sbq->round_robin && nr < sbq->sb.depth))
> *per_cpu_ptr(sbq->alloc_hint, cpu) = nr;
> }
> EXPORT_SYMBOL_GPL(sbitmap_queue_clear);
>
> If we free a tag on a CPU other than the one it's allocated on, that seems
> like it's going to guarantee a cacheline pingpong. Is the alloc_hint
> really that valuable? I'd be tempted to maintain the alloc_hint (if it's
> at all valuable) as being just a hint for which word to look at first,
> and only update it on allocation, rather than updating it on free.

The point is to update it on free, so we know where to start the search.
Ideally it'd still be free. Keep in mind that some of this was driven
by blk-mq developments, on how to retain fast allocations without
adding per-cpu caches (which we can't afford, see previous rant on
tag space utilization). On blk-mq, the free is generally on the CPU
you allocated, or close to in terms of caching. Even for shared tag
cases, I haven't seen any bad behavior from bouncing. Doesn't mean
that we could not, but I'd be wary of changing any of it without
substantial performance testing.

> Then we can drop the 'cpu' argument to sbitmap_queue_clear(), which
> would help this conversion because the percpu_ida users don't know what
> CPU their tag was allocated on.

What you want is something around the sbitmap_queue_{get,clear} that
wraps the alloc hint and wait queue mechanism, so the caller doesn't
have to deal with it. We don't have that, since the blk-mq mechanism
works out better just implementing it. We have things like queue
runs that are don't for the failure case.

--
Jens Axboe