On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <ossthema@xxxxxxxxxx> wrote:
Hi,
On Friday 24 August 2007 17:37, akepner@xxxxxxx wrote:On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:I don't see how this should work. Our latest machines are fast enough that they.......We found the same on ia64-sn systems with tg3 a couple of years ago. Using simple interrupt coalescing ("don't interrupt until you've received N packets or M usecs have elapsed") worked reasonably well in practice. If your h/w supports that (and I'd guess it does, since it's such a simple thing), you might try it.
3) On modern systems the incoming packets are processed very fast. Especially
on SMP systems when we use multiple queues we process only a few packets
per napi poll cycle. So NAPI does not work very well here and the interrupt rate is still high. What we need would be some sort of timer polling mode which will schedule a device after a certain amount of time for high load situations. With high precision timers this could work well. Current
usual timers are too slow. A finer granularity would be needed to keep the
latency down (and queue length moderate).
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not help for
the next poll cycle. The average number of packets we process per poll queue
is low. So a timer would be preferable that periodically polls the queue, without the need of generating a HW interrupt. This would allow us
to wait until a reasonable amount of packets have been received in the meantime
to keep the poll overhead low. This would also be useful in combination
with LRO.
You need hardware support for deferred interrupts. Most devices have it (e1000, sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want done by the stack,
you want the hardware to hold off interrupts until X packets or Y usecs have expired.
The parameters for controlling it are already in ethtool, the issue is finding a good
default set of values for a wide range of applications and architectures. Maybe some
heuristic based on processor speed would be a good starting point. The dynamic irq
moderation stuff is not widely used because it is too hard to get right.