Re: [PATCH v2] MIPS: Reduce _NSIG from 128 to 127 to avoid BUG_ON

From: Rich Felker
Date: Wed Sep 04 2013 - 00:41:37 EST


On Fri, Jun 28, 2013 at 11:03:33PM +0100, James Hogan wrote:
> On 28 June 2013 20:28, Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx> wrote:
> > On Monday 17 June 2013 12:36, James Hogan wrote:
> >> On 14/06/13 17:03, James Hogan wrote:
> >> > MIPS has 128 signals, the highest of which has the number 128 (they
> >> > start from 1). The following command causes get_signal_to_deliver() to
> >> > pass this signal number straight through to do_group_exit() as the exit
> >> > code:
> >> >
> >> > strace sleep 10 & sleep 1 && kill -128 `pidof sleep`
> >> >
> >> > However do_group_exit() checks for the core dump bit (0x80) in the exit
> >> > code which matches in this particular case and the kernel panics:
> >> >
> >> > BUG_ON(exit_code & 0x80); /* core dumps don't get here */
> >> >
> >> > Lets avoid this by changing the ABI by reducing the number of signals to
> >> > 127 (so that the maximum signal number is 127). Glibc incorrectly sets
> >> > [__]SIGRTMAX to 127 already. uClibc sets it to 128 so it's conceivable
> >> > that programs built against uClibc which intentionally uses RT signals
> >> > from the top (SIGRTMAX-n, n>=0) would need an updated uClibc (and a
> >> > rebuild if it's crazy enough to use __SIGRTMAX).
> >>
> >> Hmm, although this works around the BUG_ON, this doesn't actually seem
> >> to be sufficient to behave correctly.
> >>
> >> So it appears the exit status is constructed like this:
> >> bits purpose
> >> 0x007f signal number (0-127)
> >> 0x0080 core dump
> >> 0xff00 exit status
> >>
> >> but the macros in waitstatus.h and wait.h in libc
> >> (see also "man 2 wait"):
> >> WIFEXITED: status & 0x7f == 0
> >> WIFSIGNALED: status & 0x7f in [1..126] (i.e. not 0 or 127)
> >> WIFSTOPPED: status & 0xff == 127
> >>
> >> So termination due to SIG127 looks like it's been stopped instead of
> >> terminated via a signal, unless a core dump occurs in which case none of
> >> the above match.
> >>
> >> (And termination due to SIG128 hits BUG_ON, otherwise would appear to
> >> have exited normally with core dump).
> >>
> >>
> >> Reducing number of signals to 126 to avoid this will change the glibc
> >> ABI too, in which case we may as well reduce to 64 to match other
> >> arches, which is more likely to break something (I'm not really
> >> comfortable making that change).
> >>
> >> Reducing to 127 (this patch) still leaves incorrect exit status codes
> >> for SIG127 ...
> >>
> >> Any further thoughts/opinions?
> >
> > Strictly speaking, exit status of 0x007f isn't ambiguous.
> >
> > Currently userspace uses the following rules
> > (assuming that status is 16-bit (IOW, dropping PTRACE_EVENT bits)):
> >
> > WIFEXITED(status) = (status & 0x7f) == 0
> > WIFSIGNALED(status) = (status & 0x7f) != 0 && (status & 0x7f) < 0x7f
> > WIFSTOPPED(status) = (status & 0xff) == 0x7f
> > WIFCONTINUED(status) = (status == 0xffff)
> >
> > WEXITSTATUS(status) = status >> 8
> > WSTOPSIG(status) = status >> 8
> > WCOREDUMP(status) = status & 0x80
> > WTERMSIG(status) = status & 0x7f
> >
> > When process dies from signal 127, status is 0x007f and it is not a valid
> > "stopped by signal" indicator, since WSTOPSIG == 0 is an impossibility.
> >
> > Status 0x007f get misinterpreted by the rules above, namely,
> > WIFSTOPPED is true, WIFSIGNALED is false.
> >
> > But an alternative definition exists which works correctly with
> > all previous status codes, treats 0x007f as "killed by signal 127"
> > and isn't more convoluted.
> > In fact, while WIFSTOPPED needs one additional check,
> > WIFSIGNALED gets simpler (loses one AND'ing operation):
> >
> > WIFSTOPPED(status) = (status & 0xff) == 0x7f && (status >> 8) != 0
> > WIFSIGNALED(status) = status != 0 && status <= 0xff
> >
> > All other rules need no change.
> >
> > I think it's feasible to ask {g,uc}libc to change their defines
> > (on MIPS as a minimum), and live with 127 signals.
>
> Thanks for the explanation. This makes a lot of sense and if I
> understand correctly it already describes the current behaviour of the
> kernel up to SIG127 (I hadn't twigged WIFSTOPPED should imply
> WSTOPSIG!=0 for some reason). I like it.

One other note on this issue: SIG128 also aliases CLONE_VM, and it
would be very bad if a program requesting SIG128 as its exit signal
when calling clone instead ended up with the effects of CLONE_VM...

Also, I have some improved macros for WIFSTOPPED and WIFSIGNALED which
avoid multiple evaluation of their arguments:

#define WIFSTOPPED(s) ((short)((((s)&0xffff)*0x10001)>>8) > 0x7f00)
#define WIFSIGNALED(s) (((s)&0xffff)-1 < 0xffu)

These are what we are using in musl libc now.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/