Re: Availability of kdb

From: Jeff V. Merkey (jmerkey@timpanogas.com)
Date: Sun Sep 10 2000 - 19:14:03 EST


Alexander Viro wrote:

> On Sun, 10 Sep 2000, Jeff V. Merkey wrote:
>
> > already there, so folks can use it on Linux for now, and I'll stick to printk()
> > and code reviews for my debugging on Linux.
>
> Jeff, does it mean that you do not use code reviews on other projects?
>
> It's not that hard to answer - just 1 bit of information. The question
> being:
> Do you use code reviews in work on MANOS?

Yes, everyone does. But there's a class of problems, like hardware and SMP bugs,
where a debugger can help you locate a bug more quickly or profile and observe
performance issues without needing to write tons of code to instrument this type of
monitoring. I wrote MANOS soley with code reviews, but projects earlier in my
career, I used a lot of tools, like an American Arium Logic Analyser with an Inverse
Assembler and a decent kernel debugger. The tools gave me the ability to rapidly
master and solve problems that were related to early SMP hardware, and gave me an
understanding.

To cite a Linux specific example, let's take the issue with the memory write for a
spin_unlock(). Linus seemed to have trouble grasping why a simple ' mov <addr>,
0' would work as opposed to a 'lock dec <addr.>' Anyone who has ever spent late
nights with an American Arium Analyzer profiling memory bus transactions on a PPro
knows that MESI will correctly propogate via the processor caches a write to a
locked location with a correct load and stor oder without any problems of locking
concurrency. Linus' apparently did not understand this, or he would have
immediately realized that double locking was always generating a second
non-cacheable memory reference for every lock being taken and released.

There's also hidden latencies in interrupts on Intel. I know this because I have
watched bus transactions with an analyser and seen an interrupt generate reads of
the IDE, GDT,LDT, PDE, and other tables. NetWare had a coding error I fixed with
this tool that noone had ever even noticed or caught with a code review. The person
writing and updating page table entries in NetWare 4.1 was clearing the accessed bit
in the PTE and did not know that the processor would assert a hidden R/M/W operation
and assert a bus lock to set this bit everytime someone cleared it -- it made
performance drop 4% from NetWare 3.X and noone knew why. This performance problem
would have never been found without this tool. 2 years of code reviews did not
find it -- an American Ariun Analyser with a kernel debugger to stop and start and
instrument the code with writing custom stubs all over the place did.

Folks who just relay on code reviews never see this level of interaction, and
conversely, do not have the understnading of hardware behavior underneath an OS to
optimize it well. That's my case for good tools in an OS. The performance of
Linux vs. NetWare and NT in LAN environments proves this point well.

:-)

Jeff.

>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:14 EST