Re: Bash not reacting to Ctrl-C

From: Oleg Nesterov
Date: Mon Feb 07 2011 - 08:17:05 EST


On 02/05, Oleg Nesterov wrote:
>
> On 01/28, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > > On 01/28, Ingo Molnar wrote:
> > > >
> > > > The bug is that occasionally Ctrl-C does not get processed, and that the Ctrl-C is
> > > > 'lost'. It can be reproduced here by running ./test-signal several times, and
> > > > Ctrl-C-ing it:
> > > >
> > > > $ ./test-signal
> > > > ^C
> > > > $ ./test-signal
> > > > ^C^C
> > > > $ ./test-signal
> > > > ^C
> > > >
> > > > See that '^C^C' line? That is where i had to do Ctrl-C twice.
> > >
> > > Reproduced.
> > >
> > > At first glance, /bin/sh should be blamed... Hmm, probably yes,
> > > I even reproduced this under strace, and this is what I see
> > >
> > > wait4(-1, 0x7fff388431c4, 0, NULL) = ? ERESTARTSYS (To be restarted)
> > > --- SIGINT (Interrupt) @ 0 (0) ---
> > > rt_sigreturn(0) = -1 EINTR (Interrupted system call)
> > > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9706
> > >
> > > So, ^C is not lost, but ./test-signal doesn't want to exit.
> >
> > Might be some Bash assumption or race that works under other OSs but somehow Linux
> > does differently. IIRC Bash is being developed on MacOS-X.
> >
> > But it's happening all the time (with yum for example - but also with makejobs, as
> > Thomas has reported it) - this is simply the first time i managed to reproduce it
> > with something really simple.
>
> OK, I seem to understand what happens. Of course I am not sure, I never
> looked into these sources before...
>
> Suppose that jctl ^C races with the normal child exit. In this case
> waitchld() sets child->status = status (zero in this case) and calls
> set_job_status_and_cleanup().
>
> set_job_status_and_cleanup() notice wait_sigint_received and send
> SIGINT to itself (termsig_handler (SIGINT)), but somehow it assumes
> that the last foreground job should be terminated by SIGINT too:
>
> else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) &&
>
> Then the next wait_for() clears wait_sigint_received and bash
> looses ^C

IOW.

Now that it is clear what happens, the test-case becomes even more
trivial:

bash-4.1$ ./bash -c 'while true; do /bin/true; done'
^C^C

needs 4-5 attempts on my machine.

The patch below fixes the problem, but most probably it is not
correct. Although I don't understand the point of "status == SIGINT"
check, we already checked this job is dead. But I won't pretend I
really understand this code.

Oleg.

--- bash-4.1/jobs.c~ctrlc_exit_race 2011-02-07 13:52:48.000000000 +0100
+++ bash-4.1/jobs.c 2011-02-07 13:55:30.000000000 +0100
@@ -3299,7 +3299,7 @@ set_job_status_and_cleanup (job)
signals are sent to process groups) or via kill(2) to the foreground
process by another process (or itself). If the shell did receive the
SIGINT, it needs to perform normal SIGINT processing. */
- else if (wait_sigint_received && (WTERMSIG (child->status) == SIGINT) &&
+ else if (wait_sigint_received /*&& (WTERMSIG (child->status) == SIGINT)*/ &&
IS_FOREGROUND (job) && IS_JOBCONTROL (job) == 0)
{
int old_frozen;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/