Re: [PATCH v2] MIPS: Reduce _NSIG from 128 to 127 to avoid BUG_ON
From: James Hogan
Date: Fri Jun 28 2013 - 18:03:41 EST
On 28 June 2013 20:28, Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx> wrote:
> On Monday 17 June 2013 12:36, James Hogan wrote:
>> On 14/06/13 17:03, James Hogan wrote:
>> > MIPS has 128 signals, the highest of which has the number 128 (they
>> > start from 1). The following command causes get_signal_to_deliver() to
>> > pass this signal number straight through to do_group_exit() as the exit
>> > code:
>> >
>> > strace sleep 10 & sleep 1 && kill -128 `pidof sleep`
>> >
>> > However do_group_exit() checks for the core dump bit (0x80) in the exit
>> > code which matches in this particular case and the kernel panics:
>> >
>> > BUG_ON(exit_code & 0x80); /* core dumps don't get here */
>> >
>> > Lets avoid this by changing the ABI by reducing the number of signals to
>> > 127 (so that the maximum signal number is 127). Glibc incorrectly sets
>> > [__]SIGRTMAX to 127 already. uClibc sets it to 128 so it's conceivable
>> > that programs built against uClibc which intentionally uses RT signals
>> > from the top (SIGRTMAX-n, n>=0) would need an updated uClibc (and a
>> > rebuild if it's crazy enough to use __SIGRTMAX).
>>
>> Hmm, although this works around the BUG_ON, this doesn't actually seem
>> to be sufficient to behave correctly.
>>
>> So it appears the exit status is constructed like this:
>> bits purpose
>> 0x007f signal number (0-127)
>> 0x0080 core dump
>> 0xff00 exit status
>>
>> but the macros in waitstatus.h and wait.h in libc
>> (see also "man 2 wait"):
>> WIFEXITED: status & 0x7f == 0
>> WIFSIGNALED: status & 0x7f in [1..126] (i.e. not 0 or 127)
>> WIFSTOPPED: status & 0xff == 127
>>
>> So termination due to SIG127 looks like it's been stopped instead of
>> terminated via a signal, unless a core dump occurs in which case none of
>> the above match.
>>
>> (And termination due to SIG128 hits BUG_ON, otherwise would appear to
>> have exited normally with core dump).
>>
>>
>> Reducing number of signals to 126 to avoid this will change the glibc
>> ABI too, in which case we may as well reduce to 64 to match other
>> arches, which is more likely to break something (I'm not really
>> comfortable making that change).
>>
>> Reducing to 127 (this patch) still leaves incorrect exit status codes
>> for SIG127 ...
>>
>> Any further thoughts/opinions?
>
> Strictly speaking, exit status of 0x007f isn't ambiguous.
>
> Currently userspace uses the following rules
> (assuming that status is 16-bit (IOW, dropping PTRACE_EVENT bits)):
>
> WIFEXITED(status) = (status & 0x7f) == 0
> WIFSIGNALED(status) = (status & 0x7f) != 0 && (status & 0x7f) < 0x7f
> WIFSTOPPED(status) = (status & 0xff) == 0x7f
> WIFCONTINUED(status) = (status == 0xffff)
>
> WEXITSTATUS(status) = status >> 8
> WSTOPSIG(status) = status >> 8
> WCOREDUMP(status) = status & 0x80
> WTERMSIG(status) = status & 0x7f
>
> When process dies from signal 127, status is 0x007f and it is not a valid
> "stopped by signal" indicator, since WSTOPSIG == 0 is an impossibility.
>
> Status 0x007f get misinterpreted by the rules above, namely,
> WIFSTOPPED is true, WIFSIGNALED is false.
>
> But an alternative definition exists which works correctly with
> all previous status codes, treats 0x007f as "killed by signal 127"
> and isn't more convoluted.
> In fact, while WIFSTOPPED needs one additional check,
> WIFSIGNALED gets simpler (loses one AND'ing operation):
>
> WIFSTOPPED(status) = (status & 0xff) == 0x7f && (status >> 8) != 0
> WIFSIGNALED(status) = status != 0 && status <= 0xff
>
> All other rules need no change.
>
> I think it's feasible to ask {g,uc}libc to change their defines
> (on MIPS as a minimum), and live with 127 signals.
Thanks for the explanation. This makes a lot of sense and if I
understand correctly it already describes the current behaviour of the
kernel up to SIG127 (I hadn't twigged WIFSTOPPED should imply
WSTOPSIG!=0 for some reason). I like it.
Cheers
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/