Re: [git pull] kgdb-light -v10
From: Linus Torvalds
Date: Tue Feb 12 2008 - 13:13:32 EST
On Tue, 12 Feb 2008, Andi Kleen wrote:
>
> > - the kgdb commands should always act on the *current* CPU only
> > - add one command that says "switch over to CPU #n" which just releases
> > the current CPU and sends an IPI to that CPU #n (no timeouts, no
> > synchronous waiting, no nothing - it's like a "continue", but with a
> > "try to get the other CPU to stop"
>
> The problem I see here is that the kernel tends to get badly confused
> if one CPU just stops responding. At some point someone does an global
> IPI and that then hangs. You would need to hotunplug the CPU which
> is theoretically possible, but quite intrusive.
You're thinking about this totally *wrong*.
You definitely do not want to hot-unplug or isolate anything at all.
That's explicitly against the whole point of kgdb not changing what it is
trying to measure.
Just let the other CPU's hang naturally if they need to wait for IPI's
etc. What's the downside? That's what you were trying to do in the first
place by havign the kgdb callback!
So you can't have it both ways. Either serializing other cpu's with kgdb
is good (the whole "kgdb_nmicallback" thing or whatever it was called), in
which case it's also perfectly ok to just let them stop when waiting for
IPI's.
My point was *not* that kgdb should take control of one CPU, and the other
CPU's should continue to work as if nothing happened. That is insane and
impossible (since you may be stopping a CPU while it holds central
spinlocks etc). No, my point was that I think kgdb should be as light and
non-intrusive as possible, and that any "higher level behaviour" (like the
decision of whether to try to synchronize other CPU's or not) should be
left to the debugger.
But only if that makes kgdb patches less intrusive!
In other words, I'm not at all trying to push any particular solution
here, except for the "keep it simple, and anything even remotely debatable
or intrusive to the system should be excised". And I wanted to point out
that maybe all these timeout etc decisions can be pushed to the debugger.
So I think we can either:
- have no timeouts or other fancy crap _at_all_, with very simple locking
(ie looks what v10 mostly seems to do)
- or you do the fancy dance entirely in the remote debugger.
I don't care. The only thing I care about is that kgdb support never
_ever_ shows up in any interesting code, and that it remains totally
invisible to essentially all of the kernel except the place that would
otherwise print out an oops.
And I absolutely don't want it to be fancy, I want it to be so simple that
even _I_ can look at it and say "I think this is crap, but it's _trivial_
crap".
IOW: as long as people keep arguing about it, I sure as hell won't ever
merge it. It needs to be so _obvious_ and so _minimal_ that I can feel
that I finally don't need to care.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/