Re: skb_release_head_state(): Re: [Bug #11308] tbench regression oneach kernel release from 2.6.22 -> 2.6.28

From: Linus Torvalds
Date: Mon Nov 17 2008 - 16:35:38 EST




On Mon, 17 Nov 2008, Ingo Molnar wrote:
>
> this function _really_ hurts from a 16-bit op:
>
> ffffffff8048943e: 6503 66 c7 83 a8 00 00 00 movw $0x0,0xa8(%rbx)
> ffffffff80489445: 0 00 00
> ffffffff80489447: 174101 5b pop %rbx

I don't think that is it, actually. The 16-bit store just before it had a
zero count, even though anything that executes the second one will always
execute the first one too.

The fact is, x86 profiles are subtle at an instruction level, and you tend
to get profile hits _after_ the instruction that caused the cost because
an interrupt (even an NMI) is always delayed to the next instruction (the
one that didn't complete). And since the core will execute out-of-order,
you don't even know what that one is, since there could easily be
branches, but even in the absense of branches you have many instructions
executing together.

For example, in many situations the two 16-bit stores will happily execute
together, and what you see may simply be a cache miss on the line that was
stored to. The store buffer needs to resolve the read of the "pop" in
order to complete, so having a big count in between stores and a
subsequent load is not all that unlikely.

So doing per-instruction profiling is not useful unless you start looking
at what preceded the instruction, and because of the out-of-order nature,
you really almost have to look for cache misses or branch mispredicts.

One common reason for such a big count on an instruction that looks
perfectly simple is often that there is a branch to that instruction that
was mispredicted. Or that there was an instruction that was costly _long_
before, and that other instructions were in the shadow of that one
completing (ie they had actually completed first, but didn't retire until
the earlier instruction did).

So you really should never just look at the previous instruction or
anythign as simplistic as that. The time of in-order execution is long
past.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/