Re: [PATCH-v2 1/3] percpu_ida: Make percpu_ida_alloc + callersaccept task state bitmask

From: Kent Overstreet
Date: Thu Jan 23 2014 - 14:32:10 EST

On Thu, Jan 23, 2014 at 05:22:54PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 23, 2014 at 05:28:29AM -0800, Kent Overstreet wrote:
> > pool->lock is also going to be fairly badly contended in the worst case,
> > and that can get real bad real fast... now that I think about it we
> > probably want to avoid the __alloc_global_tag() double call just because
> > of that, pool->lock is going to be quite a bit more contended than the
> > waitlist lock just because fo the amount of work done under it.
> On top of the two previous; I think we can reduce pool->lock contention
> by not holding it while doing steal_tags().
> By dropping pool->lock around steal_tags() we loose serialization over:
> pool->cpus_have_tags is an atomic bitmask, and
> pool->cpu_last_stolem, that's a heuristic anyway, so sod it.
> We further loose the guarantee relied upon by percpu_ida_free(), so have
> it also acquire the tags->lock, which should be a far less contended
> resource.
> Now everything modifying percpu_ida_cpu state holds
> percpu_ida_cpu::lock, everything that modifies the actual percpu_ida
> freelists holds percpu_ida::lock, and percpu_ida_cpu::lock nests inside
> percpu_ida::lock.

I find myself wondering why I didn't originally do this - I'm a bit worried
there was some subtle race I've forgotten about with this approach - but
probably I just got tired of trying out and trying to reason about different
subtle optimizations is all :)

So yeah, assuming we can't find anything wrong with this approach - this should
be great.

> The only annoying thing is that we're still holding IRQs over
> steal_tags(), we should be able to make that a preempt_disable() without
> too much effort, or very much cheat and drop even that and rely on the
> percpu_ida_cpu::lock to serialize everything and just hope that we don't
> migrate too often.

I don't think that's an issue - it's pretty hard to come up with a scenario
where most/all of a large number of cpus are marked in the bit vector as having
tags, and _then_ none of them actually have tags (because as soon as one of them
does, we'll succeed and break out of the loop).

And then that would only be able to happen once, because we clear bits as we go.

And we need interrupts disabled while we're holding any of the spinlocks (as
freeing can certainly happen in atomic context), so the only alternative would
be to save/restore interrupts on every loop iteration, and that'll get _real_
expensive fast. (last I profiled it nested push/pop of the flags register was
_ridiculous_, sigh)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at