Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
From: Deepa Dinamani
Date: Wed May 22 2019 - 12:36:46 EST
On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 05/22, Deepa Dinamani wrote:
> >
> > -Deepa
> >
> > > On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > >
> > >> On 05/21, Deepa Dinamani wrote:
> > >>
> > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> > >> etc) only when there is no other error. If there is a signal and an error
> > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> > >> error codes.
> > >
> > > Ugh. I need to re-check, but at first glance I really dislike this change.
> > >
> > > I think we can fix the problem _and_ simplify the code. Something like below.
> > > The patch is obviously incomplete, it changes only only one caller of
> > > set_user_sigmask(), epoll_pwait() to explain what I mean.
> > > restore_user_sigmask() should simply die. Although perhaps another helper
> > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).
> >
> > restore_user_sigmask() was added because of all the variants of these
> > syscalls we added because of y2038 as noted in commit message:
> >
> > signal: Add restore_user_sigmask()
> >
> > Refactor the logic to restore the sigmask before the syscall
> > returns into an api.
> > This is useful for versions of syscalls that pass in the
> > sigmask and expect the current->sigmask to be changed during
> > the execution and restored after the execution of the syscall.
> >
> > With the advent of new y2038 syscalls in the subsequent patches,
> > we add two more new versions of the syscalls (for pselect, ppoll
> > and io_pgetevents) in addition to the existing native and compat
> > versions. Adding such an api reduces the logic that would need to
> > be replicated otherwise.
>
> Again, I need to re-check, will continue tomorrow. But so far I am not sure
> this helper can actually help.
>
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> > > size_t, sigsetsize)
> > > {
> > > int error;
> > > - sigset_t ksigmask, sigsaved;
> > >
> > > /*
> > > * If the caller wants a certain signal mask to be set during the wait,
> > > * we apply it here.
> > > */
> > > - error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
> > > + error = set_user_sigmask(sigmask, sigsetsize);
> > > if (error)
> > > return error;
> > >
> > > error = do_epoll_wait(epfd, events, maxevents, timeout);
> > >
> > > - restore_user_sigmask(sigmask, &sigsaved);
> > > + if (error != -EINTR)
> >
> > As you address all the other syscalls this condition becomes more and
> > more complicated.
>
> May be.
>
> > > --- a/include/linux/sched/signal.h
> > > +++ b/include/linux/sched/signal.h
> > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > > static inline void set_restore_sigmask(void)
> > > {
> > > set_thread_flag(TIF_RESTORE_SIGMASK);
> > > - WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> >
> > So you always want do_signal() to be called?
>
> Why do you think so? No. This is just to avoid the warning, because with the
> patch I sent set_restore_sigmask() is called "in advance".
>
> > You will have to check each architecture's implementation of
> > do_signal() to check if that has any side effects.
>
> I don't think so.
Why not?
> > Although this is not what the patch is solving.
>
> Sure. But you know, after I tried to read the changelog, I am not sure
> I understand what exactly you are trying to fix. Could you please explain
> this part
>
> The behavior
> before 854a6ed56839a was that the signals were dropped after the error
> code was decided. This resulted in lost signals but the userspace did not
> notice it
>
> ? I fail to understand it, sorry. It looks as if the code was already buggy before
> that commit and it could miss a signal or something like this, but I do not see how.
Did you read the explanation pointed to in the commit text? :
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
Let me know what part you don't understand and I can explain more.
It would be better to understand the isssue before we start discussing the fix.
-Deepa