Re: [PATCH v2] signal: add procfd_signal() syscall

From: Arnd Bergmann
Date: Fri Nov 30 2018 - 06:41:53 EST

On Fri, Nov 30, 2018 at 7:56 AM Christian Brauner <christian@xxxxxxxxxx> wrote:
> On Thu, Nov 29, 2018 at 11:13:57PM -0600, Eric W. Biederman wrote:
> > Arnd Bergmann <arnd@xxxxxxxx> writes:
> > > On Thu, Nov 29, 2018 at 9:14 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > >
> > > It looks like we already have a 'struct signalfd_siginfo' that is defined in a
> > > sane architecture-independent way, so I'd suggest we use that.
> >
> > Unfortunately it isn't maintained very well. What you can
> > express with signalfd_siginfo is a subset what you can express with
> > siginfo. Many of the linux extensions to siginfo for exception
> > information add pointers and have integers right after those pointers.
> > Not all of those linux specific extentions for exceptions are handled
> > by signalfd_siginfo (it needs new fields).

Would those fit in the 30 remaining padding bytes?

> > As originally defined siginfo had the sigval_t union at the end so it
> > was perfectly fine on both 32bit and 64bit as it only had a single
> > pointer in the structure and the other fields were 32bits in size.

The problem with sigval_t of course is that it is incompatible between
32-bit and 64-bit. We can add the same information, but at least on
the syscall level that would have to be a __u64.

> > Although I do feel the pain as x86_64 has to deal with 3 versions
> > of siginfo. A 64bit one. A 32bit one for ia32. A 32bit one for x32
> > with a 64bit si_clock_t.

At least you and Al have managed to get it down to a single source-level
definition across all architectures, but there is also the (lesser) problem
that the structure has a slightly different binary layout on each of the
classic architectures.

If we go with Andy's suggestion of having only a single binary layout
on x86 for the new call, I'd argue that we should also make it have
the exact same layout on all other architectures.

> > > We may then also want to make sure that any system call that takes a
> > > siginfo has a replacement that takes a signalfd_siginfo, and that this
> > > replacement can be used to implement the old version purely in
> > > user space.
> >
> > If you are not implementing CRIU and reinserting exceptions to yourself.
> > At most user space wants the ability to implement sigqueue:
> >
> > AKA:
> > sigqueue(pid_t pid, int sig, const union sigval value);
> >
> > Well sigqueue with it's own si_codes so the following would cover all
> > the use cases I know of:
> > int sendnewsig(pid_t pid, int sig, int si_code, const union sigval value);
> >
> > The si_code could even be set to SI_USER to request that the normal
> > trusted SI_USER values be filled in. si_code values of < 0 if not
> > recognized could reasonably safely be treated as the _rt member of
> > the siginfo union. Or perhaps better we error out in such a case.
> >
> > If we want to be flexible and not have N kinds of system calls that
> > is the direction I would go. That is simple, and you don't need any of
> > the rest.

I'm not sure I understand what you are suggesting here. Would the
low-level implementation of this be based on procfd, or do you
mean that would be done for pid_t at the kernel level, plus another
syscall for procfd?

> > Alternatively we abandon the mistake of sigqueueinfo and not allow
> > a signal number in the arguments that differs from the si_signo in the
> > siginfo and allow passing the entire thing unchanged from sender to
> > receiver. That is maximum flexibility.

This would be regardless of which flavor of siginfo (today's arch
specific one, signalfd_siginfo, or a new one) we pass, right?

> > signalfd_siginfo just sucks in practice. It is larger that siginfo 104
> > bytes versus 48. We must deliver it to userspace as a siginfo so it
> > must be translated. Because of the translation signalfd_siginfo adds
> > no flexiblity in practice, because it can not just be passed through.
> > Finallay signalfd_siginfo does not have encodings for all of the
> > siginfo union members, so it fails to be fully general.
> >
> > Personally if I was to define signalfd_siginfo today I would make it:
> > struct siginfo_subset {
> > __u32 sis_signo;
> > __s32 sis_errno;
> > __s32 sis_code;
> > __u32 sis_pad;
> > __u32 sis_pid;
> > __u32 sis_uid;
> > __u64 sis_data (A pointer or integer data field);
> > };
> >
> > That is just 32bytes, and is all that is needed for everything
> > except for synchronous exceptions. Oh and that happens to be a proper
> > subset of a any sane siginfo layout, on both 32bit and 64bit.

And that would work for signalfd and waitid_time64, but not for

> > This is one of those rare times where POSIX is sane and what linux
> > has implemented is not.
> Thanks for the in-depth explanation. So your point is that we are better
> off if we stick with siginfo_t instead of struct signalfd_siginfo in
> procfd_signal()?

I think it means we still need more discussion. Using signalfd_siginfo
without further changes doesn't seem sufficient because of the missing
sigval and the excessive length adds some cost.

siginfo_t as it is now still has a number of other downsides, and Andy in
particular didn't like the idea of having three new variants on x86
(depending on how you count). His alternative suggestion of having
a single syscall entry point that takes a 'signfo_t __user *' but interprets
it as compat_siginfo depending on in_compat_syscall()/in_x32_syscall()
should work correctly, but feels wrong to me, or at least inconsistent
with how we do this elsewhere.