Re: [PATCH] x86 rwsem optimization extreme

From: Linus Torvalds
Date: Wed Feb 17 2010 - 20:54:44 EST




On Wed, 17 Feb 2010, H. Peter Anvin wrote:
>
> FWIW, I don't know of any microarchitecture where adc is slower than
> add, *as long as* the setup time for the CF flag is already used up.

Oh, I think there are lots.

Look at just about any x86 latency/throughput table, and you'll see:

- adc latencies are typically much higher than a single cycle

But you are right that this is likel not an issue on any out-of-order
chip, since the 'stc' will schedule perfectly.

- but adc _throughput_ is also typically much higher, which indicates
that even if you do flag renaming, the 'adc' quite likely only
schedules in a single ALU unit.

For example, on a Pentium, adc/sbb can only go in the U pipe, and I think
the same is true of 'stc'. Now, nobody likely cares about Pentiums any
more, but the point is, 'adc' does often have constraints that a regular
'add' does not, and there's an example of a 'stc+adc' pair would at the
very least have to be scheduled with an instruction in between.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/