Re: [ANNOUNCE] Withdrawl of Open Source NDS Project/NTFS/M2FS for Linu

From: Kai Henningsen (kaih@khms.westfalen.de)
Date: Sat Sep 09 2000 - 08:18:00 EST


mingo@elte.hu (Ingo Molnar) wrote on 05.09.00 in <Pine.LNX.4.21.0009051245190.2575-100000@elte.hu>:

> On Tue, 5 Sep 2000, Andi Kleen wrote:
>
> > I don't really believe that. It is as easy to add a silly NULL pointer
> > check based on a oops as it is after a debugging session (and it is
> > even likely you chose the simple fix because it is harder and more
> > work to proceed) Usually with the debugger it is at least easier to
> > collect the data needed for a real fix. [...]
>
> this is the conceptual difference David tried to show, and which view i
> support as well. There are two basic types of 'debugging behaviors':
>
> A) 'Collect enough data to understand the local context and fix the bug.'
>
> B) 'See the point of the immediate crash, think about that code, think
> about code that called this code, think about the intent and
> implementation of that code and find the bug based on
> understanding the code.'

In other words, with (A) cou can see the differende between your
understanding and reality, and with (B) you can't.

> A) results in people being able to debug easy and moderate kernel bugs.
> The same people are lost if faced with bugs that are not apparent in the
> 'state of the system around the bug' represented by the debugger.
>
> B) results in people with the ability to identify bugs based on the source
> code - and the ability to fix the really mindblowing tough bugs. We had
> about 10 such bugs during the cycle of the 2.3 kernel.

I think this is complete and utter nonsense.

Just about *all* the tough bugs I've ever fixed took about 100 times more
time to fix if I couldn't look at the actual system state.

*Easy* bugs don't *need* a debugger. One look at the error message, one
slap at the head, correct the mistake.

The problem with most tough bugs is that what actually happens is *not*
what you think is happening. That's what makes them tough. That's also
what makes trying to find them by reading the source such an excercise in
frustration.

One bug I particularly remember (this was in a Pascal program) was a range
check at a point where, looking at the source, a range check "could not
happen". I spent a long time looking at the source with exactly zero
effect. And nobody knew how to reproduce the bug.

Then I managed to get my hands on a core dump. Located the place in
question. Looked at the buffer. Clearly saw it being overwritten by some
completely unrelated text.

And from the contents of that text, it was pretty obvious where to look
for the original bug, and it wasn't all that hard to find. And to
reproduce, once I knew what went wrong.

Without the core dump (or alternatively a debugger) I wouldn't even have
known to look for text, since the part which bombed wasn't handling any
text.

A classical memory corruption bug, and like most late-effect bugs hell to
find without some sort of support for poking around in the actual program
state.

> I do claim that the real mainsteam-kernel development 'bottleneck' is the
> ability to fix the tough bugs. It's always these tough bugs that keep up
> the addition of new features in devel kernels. We should thus do
> everything to teach people the proper debugging methodology.

And the proper methodology for tough bugs is named "debugger". Or core
dumps, or any other way to get at the actual system state.

It's most definitely *not* printk. That's "fooling around".

MfG Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:12 EST