Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5

From: jamal (hadi@cyberus.ca)
Date: Thu Oct 04 2001 - 06:34:56 EST


On Thu, 4 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, jamal wrote:
>
> > > which in turn stops that device as well sooner or later. Optionally,
> > > in the future, this can be made more finegrained for chipsets that
> > > support device-independent IRQ mitigation features, like the USB 2.0
> > > EHCI feature mentioned by David Brownell.
>
> > I think each subsytem should be in charge of its own fate. USB applies
> > in whatever subsystem it belongs to. Cooperating subsystems doing what
> > os best for the system.
>
> this is a claim that is nearly perverse and shows a fundamental
> misunderstanding of how Linux handles error situations. Perhaps we should
> never check NULL pointer dereference in the networking code? Should the
> NMI oopser not debug networking related lockups? Should we never print a
> warning message on a double enable_irq() in a bad networking driver?
>
> *of course* if a chipset supports IRQ mitigation then the generic IRQ code
> can be enabled to use it. We can have networking devices over USB as well.
> USB is a bus protocol that provides access to devices, not just a
> 'subsystem'. And *of course*, the IRQ code is completely right to do
> various sanity checks - as it does today.
>
> Linux has various safety nets in various places - always had. It's always
> the history of problems in a certain area, the seriousness and impact of
> the problem, and the intrusiveness of the safety approach that decides
> whether some safety net is added or not, whether it's put under
> CONFIG_KERNEL_DEBUG or not. While everybody is free to disagree about the
> importance of this particular safety net, just saying 'do not mess with
> *our* interrupts' sounds rather childish. Especially considering that
> tools are available to trigger lockups via broadband access. Especially
> considering that just a few mails earlier you claimed that such lockups do
> not even exist. To quote that paragraph of yours:
>

Your scheme is definetely a safety net, no doubt. But is incomplete.
Whatever subsytem/softirq/process in charge of the USB devices is the next
level of delegation. And of course the driver knows best what is good for
the goose. We delegate at each level of the hierachy and and the ultimate
authority is your code, when it is done right.
But i think we are deviating. We started this with network drivers, which
is where the real proven issue is.

> # Date: Wed, 3 Oct 2001 08:49:51 -0400 (EDT)
> # From: jamal <hadi@cyberus.ca>
>
> [...]
> # You dont need the patch for 2.4 to work against any lockups. And
> # infact i am suprised that you observe _any_ lockups on a PIII which
> # are not observed on my PII. Linux as is, without any tuneups can
> # handle upto about 40000 packets/sec input before you start observing
> # user space startvations. This is about 30Mbps at 64 byte packets; its
> # about 60Mbps at 128 byte packets and comfortable at 100Mbps with byte
> # size of 256. We really dont have a problem at 100Mbps.
>
> so you should never see any lockups.
>

For your P3 is what i meant since i see none on my P2 at the 100Mbps with
256 bytes. But you probably meant user-space starvation under maybe
twice that rate to which i agree and apologize for misunderstanding.

> > Your patch with Linus' idea of "flag mask" would be more acceptable as
> > a last resort. All subsytems should be cooperative and we resort to
> > this to send misbehaving kids to their room.
>
> i have nothing against it in 2.5, of course. Until then => my patch adds
> an irq.c daddy that sends the bad kids to their room.

until then change the eepro to use at least hardware flow control.
If it has mitigation, use the return codes from netif_rx(). Lets see if
that doesnt help you; yes, its a pain, but avoids a lot of unknowns which
your patch introduces.

> > > Your NAPI patch, or any driver/subsystem that does flowcontrol accurately
> > > should never be affected by it in any way. No overhead, no performance
> > > hit.
> >
> > so far your appraoch is that of a shotgun [...]
>
> i'm not sure what this has to do with your NAPI patch. You should never
> see the code trigger. It's an unused sledgehammer (or shotgun) put into
> the garage, as far as NAPI is concerned. And besides, there are lots of
> people on your continent that believe in spare shotguns ;)
>
> i'd rather compare this approach to an airbag, or perhaps shackles.
> Interrupt auto-limiting, despite your absurd and misleading analogy, does
> not 'destroy' or 'kill' anything. It merely limits an IRQ source for up to
> 10 msecs (if HZ is 1000 then it's only 1 msec), if that IRQ source has
> been detected to be critically misbehaving.

Well, i meant two things:
1) you shut down shared interupts; take a look at this posting by Marcus
Sundberg <marcus@cendio.se>

---------------

  0: 7602983 XT-PIC timer
  1: 10575 XT-PIC keyboard
  2: 0 XT-PIC cascade
  8: 1 XT-PIC rtc
 11: 1626004 XT-PIC Toshiba America Info Systems ToPIC95 PCI
\
to Cardbus Bridge with ZV Support, Toshiba America Info Systems ToPIC95
PCI \
to Cardbus Bridge with ZV Support (#2), usb-uhci, eth0, BreezeCom Card, \
Intel 440MX, irda0 12: 1342 XT-PIC PS/2 Mouse
 14: 23605 XT-PIC ide0

-----------------------------

Now you go and shut down IRQ 11 and punish all devices there. If you can
avoid that, it is acceptable as a temporary replacement to be upgraded to
a better scheme.

2) By not being granular enough and shutting down sources of noise, you
are actually not being effective in increasing system utilization. weve
beat this to death.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Oct 07 2001 - 21:00:32 EST