Re: [PATCH 4/5] don't panic if /sbin/init exits or killed

From: Oleg Nesterov
Date: Sun Mar 16 2008 - 19:04:48 EST


On 03/16, Roland McGrath wrote:
>

(re-ordered)

> Have you tested how recoverable it really is? I wonder what happens
> with init having exited when things get reparented to it. Don't the
> zombies just pile up?

Yes sure, we leak the re-parented zombies, and nobody can take care of
/etc/inittab. As expected.

But otherwise the system runs fine.

> BUG() does not seem right to me. This does not diagnose any kernel bug.
> The kernel source location and backtrace are not useful. In fact, they
> are likely to mislead the user into reporting the bug to the wrong place
> (because it will look like a kernel bug).

But panic() isn't better? It doesn't provide any useful info.

> I gather your motivation is to get something "recoverable" rather than
> always rebooting. This might be useful for developers like you and me.
> I suspect that conservative administrators of production systems prefer
> the current behavior. If the boot init dies, that is reasonably likely
> to be a "catastrophic" failure of the system as a whole as far as the
> proprietor of a production system is concerned. That is, the system may
> no longer behave as expected in ways essential for its normal operation.
> If it sticks around in that condition, appearing to be available but not
> doing everything it should, that is usually worse than a quick and
> orderly crash (which the installation's procedures and monitoring
> infrastructure are often prepared to handle).

Well, I think the generic "if we have a chance to survive, we should try
to survive" rule is good.

If the boot init dies, at least the admin has a chance to figure out what
has happened, and -o remount,ro /.

Every BUG/BUG_ON in fact means the system is not useable, but still it does
not panic(), but tries to proceed.

In short, I can't see why panic() is better. Except we have panic_timeout,
but we can take it into account if init exits.

> panic is a bit extreme for the situation, where we have no reason yet to
> think kernel data structures are inconsistent. A sync+reboot or sync+crash
> without bust_spinlocks et al might be better.
>
> For letting init die and calling it recoverable for hacking purposes, a
> sysctl to disable the panic/crash makes sense. But I don't think we
> should change the default setting.

OK, I won't argue (not that I agree ;).

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/