RE: Availability of kdb

From: Marty Fouts (marty@dotcast.com)
Date: Mon Sep 18 2000 - 00:42:22 EST

Next message: David S. Miller: "Re: Availability of kdb"
Previous message: H. Peter Anvin: "Re: Linux-2.4.0-test9-pre2"
Reply: David S. Miller: "Re: Availability of kdb"
Reply: Tigran Aivazian: "RE: Availability of kdb"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I agree about needing to know all of the tools in the tool chest, including
the hand ones. Nothing in what I've said about needing to include the
debugger has been an argument against *also* having a full chest of other
tools.

On the other hand, Linus is wrong, and your attempt to defend him is a
non-sequitor.

I've probably debugged more operating systems under more varied environments
than nearly anyone here, having brought up a new OS, compiler, and CPU
concurrently and having debugged everything from card-batch monitors to
fairly large distributed systems. On any number of occasions, as others
here have noted, a debugger has been essential, and utter knowledge of the
code is of no use, because the code has relied on hardware that is in
reality behaving differently than it is documented to behave. No amount of
code reading is going to find those cases.

I've also found a lot of problems where the symptoms appear in one subsystem
but are the result of a bug in another. No amount of reading the code from
the first subsystem is going to find the bug in the second, but those are
the bugs that are going to give you the most headaches without good
debugging tools.

You continue to conflate two related but different parts of the debugging
process. Identifying what the system *is* doing is different than
identifying what it *should* be doing. The social-engineering argument
against using debuggers is an argument against making the mistake of trying
to both jobs with one tool.

In my "perfect" universe, here's how I debug kernels, when I'm not worried
about the hardware being the problem, and the problem is in code written by
someone else:

* Have access to a browsable cross-referenced source tree for the exact
kernel being debugged (in a _really_ perfect world, I'd have specifications
for what the code should be doing, but hey, we're in OS hacking here, so
we'll ignore software engineering things like requirements, specifications,
pre-conditions, "programming with contracts", and stick to
hairy-chested-he-man-debugging)
* Do my best to reduce the test case to the smallest set of actions that
will reproduce the failure - this may involve using debuggers, logic
analysers, and various "jigs" to help isolate system behavior.
* Loop:
o Read the source code to see if I can understand the failure behavior, in
the process, questions will arise, like "shouldn't this variable be mumble
at this point
o Run the test case under the debugger *while* reading the source code (best
if I can use a remote debugger that interacts with my source code browser)
and use the debugger to validate my assumptions by answering those questions
* the loop is repeated until I have an "aha" moment, which is the point at
which I *think* I see what the code is doing that it shouldn't have done.
* Stop what I'm doing and have a diet coke. Preferably while reading the
module that I think is misbehaving. (in a _really_ perfect world, while
reading it with the guy who wrote it.)
* Write up and desk check the code that should change the behavior (with the
guy who wrote the original as a code reviewer, if possible)
* Run the regression suite - if the patch works, take it to code review. If
it passes code review, try to put the test case in the regression suite, if
it can be done.

The debugger is useful, along with visualization tools, trip wires and a
dozen other techniques in solving a very important social engineering
problem that I haven't seen mentioned in this thread: The bug got there
after my team's best effort to write correct code in the first place, an
effort that involves specs, code reviews, coding standards and a number of
other tools. That means we have a conceptual failure to understand our own
code. As any proof reader will tell you: mistakes like that that get by are
nearly impossible to catch.

Basically, I use a debugger when I realize that I've developed a perception
block and I want to validate my perception against reality. Computing is,
after all, an empirical science.

Marty

-----Original Message-----
From: Larry McVoy [mailto:lm@bitmover.com]
Sent: Sunday, September 17, 2000 8:41 PM
To: Marty Fouts
Cc: 'Linus Torvalds'; Oliver Xymoron; Tigran Aivazian; Daniel Phillips;
Kernel Mailing List
Subject: Re: Availability of kdb

On Sun, Sep 17, 2000 at 02:33:40PM -0700, Marty Fouts wrote:
> Um, for what ever it is worth, if you want to compare "power user"
carpentry
> to "hand tools only" you can actually do it fairly easily on PBS in the
US.
> There used to be a program done by a guy who did everything by hand. I
> loved to watch it, especially the parts where he cut himself badly because
> there are somethings it is dumb to do with hand tools, but he was stuck
with
> his dumb rule. There's another show, still on, called "The New Yankee
> Workshop". I love to watch it, just to count the number of power tools
Norm
> Abrams manages to use in a single project. (I think the most I saw in one
> one hour episode was 40-something.)
>
> Craftsmanship does *not* come from artificial rules about what tools you
are
> allowed to use. There were hack carpenters when there weren't any power
> tools, and the cabinet makers I know who do the best work (the sort that
> gets them several thousand dollars a piece for small pieces of furniture)
> use every power tool they find appropriate to their work; just as they
> construct and use jigs and rely on all the other "tricks of the trade".
>
> Craftsmanship is in the way you approach what you do, not in the tools you
> use to do it. And, frankly, if you wish to artificially limit your use of
> tools, all you are doing is hobbling yourself.

Ahh, now we're having fun. It just so happens that I can speak to all
of these topics; not only have I seen the shows mentioned, but I have
a shop out the back which has both a pile of power tools and a pile of
antique tools (oldest one that I know of was made around 1837, a great
big spokeshave). I use all of them regularly, no collector-foo here,
thank you. I tend to retreat to working with hand tools when all this
geek stuff gets to be a bit much.

Cabinet making craftsmanship absolutely comes from a firm knowledge of
hand tools. I'll bet you anything you want that the guy who sells that
$3K furniture knows exactly what a Norris is and has used one. Probably
still does (OK, mebbe it's a Lie-Nielsen these days).

I've also used a number of kernel debuggers - kadb back at Sun, a monitor
back at ETA (amazingly similar in spirit to RT/Linux), SGI's monstrosity,
and probably others.

That said, who gives a hoot what I have or what I have used? The question
is: does Linus have a point or not? And the answer is, you bet he does.

Linus is saying that if you need a debugger then you don't know the code.
And if you don't know the code, then you shouldn't be hacking the code.
A debugger does little besides cover up a lack of knowledge.

It's not an easy to take point of view because by definition, most
people don't really know the code so most people want a debugger.
Linus would just as soon that you learn the code well enough that a
debugger becomes pointless.

I'm sort of in the middle. I know BitKeeper very well, and it's actually
a larger wad of code than the kernel if you toss out the device drivers.
About the only thing I ever want a debugger for is a stacktrace back. If
you give me that, I usually don't need anything else; and in general, you
shouldn't either. You should *know* why you got to a particular place,
if you don't know that then how can you fix the bug?

So I'm gonna side with Linus on this one, if you make it hard now, it will
be easier later. It also increases the quality of the people submitting
patches, which is a good thing.

--
---
Larry McVoy              lm at bitmover.com
http://www.bitmover.com/lm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: David S. Miller: "Re: Availability of kdb"
Previous message: H. Peter Anvin: "Re: Linux-2.4.0-test9-pre2"
Reply: David S. Miller: "Re: Availability of kdb"
Reply: Tigran Aivazian: "RE: Availability of kdb"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:16 EST