Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

From: Deepa Dinamani
Date: Fri May 24 2019 - 13:04:51 EST


On Fri, May 24, 2019 at 9:33 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 05/24, Deepa Dinamani wrote:
> >
> > On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > >
> > > On 05/23, Deepa Dinamani wrote:
> > > >
> > > > Ok, since there has been quite a bit of argument here, I will
> > > > backtrack a little bit and maybe it will help us understand what's
> > > > happening here.
> > > > There are many scenarios being discussed on this thread:
> > > > a. State of code before 854a6ed56839a
> > >
> > > I think everything was correct,
> >
> > There were 2 things that were wrong:
> >
> > 1. If an unblocked signal was received, after the ep_poll(), then the
> > return status did not indicate that.
>
> Yes,
>
> > This is expected behavior
> > according to man page. If this is indeed what is expected then the man
> > page should note that signal will be delivered in this case and return
> > code will still be 0.
> >
> > "EINTR
> > The call was interrupted by a signal handler before either any of the
> > requested events occurred or the timeout expired; see signal(7)."
>
> and what do you think the man page could say?

Maybe clarify that a signal handler can be invoked even if the syscall
return indicates a success.

Maybe a crude userspace application could do something like this:

sig_handler()
{
set global abort = 1
}

poll_the_fds()
{
ret = epoll_pwait()
if (ret)
return ret
if (abort) # but this abort should be ignored
if ret was 0.
return try_again

}

> This is obviously possible for any syscall, and we can't avoid this. A signal
> can come right after syscall insn completes. The signal handler will be called
> but this won't change $rax, user-space can see return code == 0 or anything else.
>
> And this doesn't differ from the case when the signal comes before syscall returns.

But, these syscalls are depending on there signals. I would assume for
the purpose of these syscalls that the execution is done when we
updated the saved_sigmask. We can pick a different point per syscall
like ep_poll() also, but then we need to probably make it clear for
each such syscall.

> > 2. The restoring of the sigmask is done right in the syscall part and
> > not while exiting the syscall and if you get a blocked signal here,
> > you will deliver this to userspace.
>
> So I assume that this time you are talking about epoll_pwait() and not epoll_wait()...

Yes.

> And I simply can't understand you. But yes, if the original mask doesn't include
> the pending signal it will be delivered while the syscall can return success/timout
> or -EFAULT or anything.
>
> This is correct, see above.

Look at the code before 854a6ed56839a:

/*
* If we changed the signal mask, we need to restore the original one.
* In case we've got a signal while waiting, we do not restore the
* signal mask yet, and we allow do_signal() to deliver the signal on
* the way back to userspace, before the signal mask is restored.
*/
if (sigmask) {
####### This err has not been changed since ep_poll()
####### So if there is a signal before this point, but
err = 0, then we goto else.
if (err == -EINTR) {
memcpy(&current->saved_sigmask, &sigsaved,
sizeof(sigsaved));
set_restore_sigmask();
} else
############ This is a problem if there is signal
pending that is sigmask should block.
########### This is the whole reason we have
current->saved_sigmask?
set_current_blocked(&sigsaved);
}

> > > > b. State after 854a6ed56839a
> > >
> > > obviously buggy,
> >
> > Ok, then can you point out what specifically was wrong with
> > 854a6ed56839a?
>
> Cough. If nothing else the lost -EINTR?

This was my theory. My basis behind the theory was [1](the issue with
return value not being updated) above. And, you are saying this is ok.

854a6ed56839a also has timing differences compared to the original
code. So unless we are sure what was uncovered because of
854a6ed56839a, we might just be masking a pre-existing problem by
reverting it. So I think we should code review 854a6ed56839a and
figure out what is wrong programatically before just reverting it.

> > And, not how it could be more simple?

Oh, I was not asking here. I was saying let's please discuss what's
wrong before simplifying the code.

-Deepa