Re: Profanity in the Linux Kernel?!?!?

Riley Williams (rhw@MemAlpha.CX)
Wed, 16 Jun 1999 18:05:05 +0100 (GMT)


Hi Kai.

>> a misleading or ambiguous error message is worse than no
>> error message at all in my experience.

> Let me go on record as violently disagreeing, here.

> A bad error message is still something you can track down. Even
> if it's bad, you know that execution reaches the point where it
> gets printed, and you can backtrack from there.

If you can backtrack to the actual error, you're extremely lucky in my
experience.

> No error message and you have not even that.

The result being that you normally have just as much idea where the
error actually is in both of the above cases - ie, none whatsoever.

> In fact, in my experience, about a third of all error reports
> are misleading anyway, even given the most careful messages, and
> there's nothing you can do about that.

In my experience, error reports can be broken down into the following
four categories:

1. Syntax errors, or errors of coding that prevent the program
from compiling correctly. These are all detected straight away,
and are the easiest errors to fix.

2. Linking errors, or errors that permit the program to compile,
but prevent it from linking into a valid executable. These are
also detected straight away, and are the next easiest errors
to fix.

3. Trapped logical errors, or errors for which the program checks
at the point where the error may occur. Because they are trapped,
these errors are relatively easy to fix as one knows exactly
where to look for the error based on the error message. As a
result, most programs over 6 months old have few of these left.

4. Untrapped logical errors, or errors for which no check is made
by the program at the point where the error may occur. Because
they are NOT trapped, these errors are extremely hard to locate.

Errors in the first two categories can basically be ignored as far as
this discussion is concerned, and in my experience, the split between
the last two categories tends to be 25% for (3) and 75% for (4), and
the easiest way to determine which category they're in is by how clear
the error message is.

Difficult to understand error messages are inevitably also old ones,
usually ones that made sense when they were first written, but for
which their original meaning no longer applies, and their appearance
generally indicates an error in category (4) for which the only thing
one can safely say is that the actual error will be nowhere near the
error message.

On the other hand, an error message that's easy to understand usually
indicates an error in category (3) and thus one that's easily tracked
down and fixed.

Personally, I'd love to see something along the lines of a kernel
orientated assert() macro inserted throughout the kernel, with the
property that if the condition listed therein was false, it reported
the fact to some systrace facility (perhaps using syslog) so one could
use that as an additional aid when tracking down problems. If it was
implemented properly, then the ONLY reports it produced would relate
to assumptions kernel writers both made and documented that were not
valid.

Best wishes from Riley.

+----------------------------------------------------------------------+
| There is something frustrating about the quality and speed of Linux |
| development, ie., the quality is too high and the speed is too high, |
| in other words, I can implement this XXXX feature, but I bet someone |
| else has already done so and is just about to release their patch. |
+----------------------------------------------------------------------+
* ftp://ftp.MemAlpha.cx/pub/rhw/Linux
* http://www.MemAlpha.cx/kernel.versions.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/