Re: [PATCH] coredump: Retry writes where appropriate

From: Paul Smith
Date: Wed Jun 03 2009 - 10:06:20 EST


On Mon, 2009-06-01 at 13:38 -0700, Roland McGrath wrote:
> 1. More core-dump signals. e.g., it was already crashing and you hit ^\
> or maybe just hit ^\ twice with a finger delay.
> 2. Non-fatal signals (i.e. ones with handlers, stop signals).
> 3. Plain sig_fatal() non-core signals (e.g. SIGINT when not handled)
> 4. SIGKILL (an actual one from userland or oomkill, not group-exit)
>
> #1 IMHO should not do anything at all.
> You are asking for a core dump, it's already doing it.
>
> #2 should not do anything at all.
> It's not really possible to suspend during the core dump, so unhandled,
> unblocked stop signals can't do anything either.
>
> #4 IMHO should always stop everything immediately.
> That's what SIGKILL is for. When userland generates a SIGKILL
> explicitly, that says the top priority is to be gone and cease
> consuming any resources ASAP.
>
> #3 is the open question. I don't feel strongly either way.

Thanks Roland. This is a great summary and lends clarity to the
discussion.

Actually I'm quite happy with the above for #'s 1, 2, and 4.

I've already stated my preference that #3 should behave like #2, but
certainly people can disagree on this and I understand that some would
like it to behave as #4. Best case is this can be configured or, at
least if it's documented clearly userspace applications can code
defensively by masking those signals (this has minor annoyances but...)

Unfortunately the discussion you and Oleg are having shows me how little
I know about this area of the kernel and what a bad idea it would be for
me to try to get this right on my own :-). However, I'm happy to test
patches, comment on solutions, etc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/