Re: [cpuops cmpxchg V2 3/5] irq_work: Use per cpu atomics insteadof regular atomics

From: Peter Zijlstra
Date: Wed Dec 15 2010 - 12:18:59 EST


On Wed, 2010-12-15 at 11:04 -0600, Christoph Lameter wrote:

> Prefixes are faster than explicit address calculations. A prefix allows
> you to integrate the per cpu address calculation into an arithmetic
> operation.

Well, depends on how often you need that address I'd think. If you'd
have a per-cpu struct and need to frob lots of variables in that struct
it might be cheaper to simply compute the struct address once and then
use relative addresses than to prefix everything with %fs.

> A prefix is one byte which is less that multiple arithmetic operations to
> calculate an address.

I thought you'd only need a single arithmetic op to calculate the
address, anyway at some point those 1 byte prefixes will add up to more
than the ops saved.

In the current code you add 2 bytes (although you safe one from loosing
the LOCK prefix, but that could have been achieved by using
cmpxchg_local() as well. These 2 bytes are probably less than the
address computation for head (and not needing the head pointer again
saves on register pressure) so its probably a win here.

Still, non of this is really fast-path code, so I really wonder why
we're optimizing this over keeping the code obvious.

> I am not sure that the preempt_disable/enable is needed. They are just
> there because you had a get/put_cpu there.
>
> If the code is run from hardirq context then preempt is already disabled.
> We can just drop those then.

Afaik the current callers are all from IRQ/NMI context, but I don't want
to mandate callers be from such contexts.

The problem is that we need to guarantee we raise the self-IPI on the
same cpu we queued the worklet on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/