Re: driver irq handler latency

Jukka Tapani Santala (e75644@UWasa.Fi)
Sun, 20 Sep 1998 02:26:35 +0300 (EET DST)


On Sat, 19 Sep 1998, David Mansfield wrote:
> First off, I'm going to pardon myself in advance for speaking without
> knowing anything. Secondly, I'm trying to track down the cause of
> (apparently) missed serial IRQ's when using the aha152x and IDE drivers

First thing you do, reboot with "profile=1" on the kernel boot line. Then
get the 'readprofile' tool available from most well equipped FTP sites.
It only handles function-resolution and has one point alignment problem,
but for your case it doesn't particularily matter yet. You can try to
search for the patch I wrote for it earlier, though, or wait until I get
back on a computer to send it to you ;) Compile that, and use it...
do_gettimeofday() is ugly and doesn't take into account everything...
(Otoh, it may work better for places where interrupts are blocked).

> simultaneosly (see previous post: "aha152x in 2.1.121 causes dropped ppp
> packets"). The apparently missed serial IRQ's manifest in PPP framing
> errors out the yin-yang.

I posted about aha152x drivers slowness a year or so ago, and posted a
patch to "solve" it, though I didn't have any specs or the like, so I'm
pretty sure fix was wrong in large sense of things - it did free a lot of
CPU on my machine and never caused trouble, though. Unfortunately I since
gave up using that combination for Linux, and have no idea if the patch
is still available anywhere. I would imagine it'd be easy to reporduce,
possibly right way this time ;) I later learned I had misplaced some SCSI
terminators, but that didn't seem to be the problem.

The "confirmation" for final SCSI-command byte in a send-loop was
_majorly_ delayed, and the driver stays there waiting for it. I simply
gave up waiting for it; probably there's some other signal that should be
looked for, it could be checked in subsequent interrupt or something. Or
then the handshake isn't a confirmation, but instead a "ready to receive"
signal. Somewhat likely, because the last one is so delayed, if it's a
confirmation of receipt, it'd make sense to send it out first and then do
whatever the request was - in this case the interface seems to do the
request and only then set the signal.

That's the basis I had for removing the last check; in fact, there isn't
even retransmit, just wait. Perhaps I was just lucky, but that never lead
to any problems or disk corruption. I understand Adaptec isn't very
forthcoming with specs, and I myself don't have access to any SCSI specs,
so that's the best I could do.

> I'm trying to determine if the time spent handling the aha152x interrupt
> is causing the serial interrupt to be missed. If this is actually
> impossible (i.e. the one has no effect on the other) stop reading now.

Well, this _is_ the problem. Apparently, while in an interrupt, other
interrupts should be able to be called and work as usual (For example,
the aha152x slowup is visible in the profile generated by
timer-interrupt). However, you mention IDE involved, and it's a good
guess that for some reason IDE & SCSI operating at same time ends up
locking interrupts. It's little harder to say exactly why/how, but get
the above 'readprofile' tool and look for anomalities in the profile.

For tips, echo 1 > /proc/profile to reset counters, then run your first
tape/tar-example, log readprofile results. Then reset counters again and
repet with the second example. And check for the most CPU-extensive
functions; if you get the opcode-resolution profile, a peak implies very
expensive command (with pipeline/cache misses etc. often), or process
coming out of no-interrupts stage. With steadily high figures along the
whole function or just longer parts of it (possibly with peaks), it's one
that's being called often.

-Donwulff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/