Re: [Bug #11308] tbench regression on each kernel release from2.6.22 -> 2.6.28

From: Ingo Molnar
Date: Mon Nov 17 2008 - 11:12:16 EST



* Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:

>> It all looks like pure old-fashioned straight overhead in the
>> networking layer to me. Do we still touch the same global cacheline
>> for every localhost packet we process? Anything like that would
>> show up big time.
>
> Yes we do, I find strange we dont see dst_release() in your NMI
> profile
>
> I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387
> net: make sure struct dst_entry refcount is aligned on 64 bytes) (in
> net-next-2.6 tree) to properly align struct dst_entry refcounter and
> got 4% speedup on tbench on my machine.

Ouch, +4% from a oneliner networking change? That's a _huge_ speedup
compared to the things we were after in scheduler land. A lot of
scheduler folks worked hard to squeeze the last 1-2% out of the
scheduler fastpath (which was not trivial at all). The _full_
scheduler accounts for only about 7% of the total system overhead here
on a 16-way box...

So why should we be handling this anything but a plain networking
performance regression/weakness? The localhost scalability bottleneck
has been reported a _long_ time ago.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/