Re: [PATCH 1/7] ia64, kdump: Mask MCA/INIT on freezing cpus

From: Hidetoshi Seto
Date: Wed Jun 24 2009 - 22:16:30 EST


Robin Holt wrote:
> The concern is that any time we prevent SAL from receiving control during
> an MCA/INIT, we reduce the maintainability of the machine. Having them
> masked at any time results in the NMI/INIT not recording the PROM record
> which we use to diagnose where the hang is.

Think about servers which have no such PROM record features... Please?

The original problem here, which I wrote these patches for, is that the
INIT can block retrieving crashdump via kdump. The crashdump is the only
record which we can use to diagnose where the hang is, if the PROM record
like SGI servers have is not supported.
(I guess the even the PROM record is supported, the crashdump is better,
more important resource for the trouble shooting.)

My patches will mask MCA/INIT on all CPUs once kdump is invoked (via
panic or INIT), and soon unmask one of them who is going to jump in 2nd
kernel (=kdump kernel) after registering a do-nothing handler.

If there was a pending INIT, it will be received on the cpu as soon as
it is unmasked. Then the PROM will make a record on it, pass the control
to OS_INIT which does nothing, and return to interrupted context to
continue processing the kdump.

What time point are you concerning?


> In other patches, you implemented a do-nothing handler. Could that
> be used?

... How? Maybe I could not catch your point.

It would be useful, but it is only available from the beginning of 2nd
kernel (to be exact, from the end of 1st kernel), until new INIT handlers
for 2nd kernel is registered.


> Alternatively, when the machine is first booted, the handler is defined
> by SAL as a SAL routine. Could you record that during kernel boot and
> then just set the handler back to the SAL provided one prior to starting
> the kexec kernel boot? At that point, the machine is more like the
> first boot. Now that I think about this, this alternative seems fairly
> attractive.

I think it is definitely wrong thing if SAL provides the initial handler
as OS_INIT which can be removed/replaced by OS.

Since INIT event processes PAL_INIT -> SAL_INIT -> OS_INIT(if available),
SAL should keep the entry point of its initial handler and should use it
from SAL_INIT when OS_INIT is not registered. Ditto to OS_MCA.


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/