Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

From: Benjamin LaHaise
Date: Sun Jan 29 2006 - 14:58:36 EST


On Sun, Jan 29, 2006 at 07:54:09AM +0100, Eric Dumazet wrote:
> Well, I think that might be doable, maybe RCU magic ?
>
> 1) local_t are not that nice on all archs.

It is for the users that matter, and the hooks are there if someone finds
it to be a performance problem.

> 2) The consolidation phase (summing all the cpus local offset to
> consolidate the central counter) might be more difficult to do (we would
> need kind of 2 counters per cpu, and a index that can be changed by the cpu
> that wants a consolidation (still 'expensive'))

For the vast majority of these sorts of statistics counters, we don't
need 100% accurate counts. And I think it should be possible to choose
between a lightweight implementation and the expensive implementation.
On a chip like the Core Duo the cost of bouncing between the two cores
is minimal, so all the extra code and data is a waste.

> 3) Are the locked ops so expensive if done on a cache line that is mostly
> in exclusive state in cpu cache ?

Yes. What happens on the P4 is that it forces outstanding memory
transactions in the reorder buffer to be flushed so that the memory barrier
semantics of the lock prefix are observed. This can take a long time as
there can be over a hundred instructions in flight.

-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <dont@xxxxxxxxx>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/