Re: [PATCH v2] exec: don't force_sigsegv processes with a pending fatal signal

From: Ivan Delalande
Date: Mon Feb 11 2019 - 18:25:33 EST


On Sun, Feb 10, 2019 at 11:05:52AM -0600, Eric W. Biederman wrote:
> Ivan Delalande <colona@xxxxxxxxxx> writes:
> > A difference I've noticed with your tree (unrelated to my issue here but
> > that you may want to look at) is when I run my reproducer under
> > strace -f, I'm now getting quite a lot of "Exit of unknown pid 12345
> > ignored" warnings from strace, which I've never seen with mainline.
> > My reproducer simply fork-exec tail processes in a loop, and tries to
> > sigkill them in the parent with a variable delay.
>
> What was your base tree?

It was just off v5.0-rc5, and I didn't see these warnings on the last
few RCs either. Now I'm seeing them on vanilla v5.0-rc6 as well.

> My best guess is that your SIGKILL is getting there before strace
> realizes the process has been forked. If we can understand the race
> it is probably worth fixing.
>
> Any chance you can post your reproducer.

Sure, see the attachment. I think this is the simplest version where
these warnings show up. This one just forks/exec `tail -a` to make it
fail and exit 1 as soon as possible, and progressively increase the
delay between the fork and sigkill to try to hit our original issue,
stopping and restarting only after 10 completions of the child as the
timing varies a fair bit.

Running this program under `strace -f -o /dev/null` prints the warnings
almost instantly on my system.

> It is possible it is my most recent fixes, or it is possible something
> changed from the tree you were testing and the tree you are working
> on.

Thanks,

--
Ivan Delalande
Arista Networks
#define _GNU_SOURCE
#include <time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>

int main(void)
{
pid_t pid;
int status;
size_t i, count;
unsigned long max = 300000, first;
struct timespec ts = { .tv_nsec = 1 };
char* const argv[] = {"/bin/tail", "-a", NULL};

for (i = 0; i < 42000; ++i) {
for (count = first = 0, ts.tv_nsec = 1;
ts.tv_nsec < max && count < 10;
ts.tv_nsec += 1) {
if ((pid = fork())) {
if (pid < 0)
continue;
nanosleep(&ts, NULL);
kill(pid, SIGKILL);
if (waitpid(pid, &status, 0) != pid)
continue;
if (WIFSIGNALED(status) &&
WTERMSIG(status) == 9) {
continue;
} else if (WIFEXITED(status) &&
WEXITSTATUS(status) == 1) {
count++;
if (!first)
first = ts.tv_nsec;
} else
printf("%lu: %x\n", ts.tv_nsec, status);
} else {
close(STDOUT_FILENO);
close(STDERR_FILENO);
execve("/bin/tail", argv, NULL);
_exit(2);
}
}
if (max < ts.tv_nsec)
max = ts.tv_nsec;
if (count < 10)
max += 5000;
printf("break at %lu (max: %lu) count %lu (first at %lu)\n",
ts.tv_nsec, max, count, first);
}

return 0;
}