Re: [PATCH v6 6/9] seccomp: add "seccomp" syscall

From: Kees Cook
Date: Fri Jun 13 2014 - 18:01:29 EST


On Fri, Jun 13, 2014 at 2:42 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Fri, Jun 13, 2014 at 2:37 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>> On Fri, Jun 13, 2014 at 2:25 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> On Fri, Jun 13, 2014 at 2:22 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>>>> On Tue, Jun 10, 2014 at 8:25 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>>> This adds the new "seccomp" syscall with both an "operation" and "flags"
>>>>> parameter for future expansion. The third argument is a pointer value,
>>>>> used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
>>>>> be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
>>>>>
>>>>> Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx>
>>>>> Cc: linux-api@xxxxxxxxxxxxxxx
>>>>> ---
>>>>> arch/x86/syscalls/syscall_32.tbl | 1 +
>>>>> arch/x86/syscalls/syscall_64.tbl | 1 +
>>>>> include/linux/syscalls.h | 2 ++
>>>>> include/uapi/asm-generic/unistd.h | 4 ++-
>>>>> include/uapi/linux/seccomp.h | 4 +++
>>>>> kernel/seccomp.c | 63 ++++++++++++++++++++++++++++++++-----
>>>>> kernel/sys_ni.c | 3 ++
>>>>> 7 files changed, 69 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
>>>>> index d6b867921612..7527eac24122 100644
>>>>> --- a/arch/x86/syscalls/syscall_32.tbl
>>>>> +++ b/arch/x86/syscalls/syscall_32.tbl
>>>>> @@ -360,3 +360,4 @@
>>>>> 351 i386 sched_setattr sys_sched_setattr
>>>>> 352 i386 sched_getattr sys_sched_getattr
>>>>> 353 i386 renameat2 sys_renameat2
>>>>> +354 i386 seccomp sys_seccomp
>>>>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>>>>> index ec255a1646d2..16272a6c12b7 100644
>>>>> --- a/arch/x86/syscalls/syscall_64.tbl
>>>>> +++ b/arch/x86/syscalls/syscall_64.tbl
>>>>> @@ -323,6 +323,7 @@
>>>>> 314 common sched_setattr sys_sched_setattr
>>>>> 315 common sched_getattr sys_sched_getattr
>>>>> 316 common renameat2 sys_renameat2
>>>>> +317 common seccomp sys_seccomp
>>>>>
>>>>> #
>>>>> # x32-specific system call numbers start at 512 to avoid cache impact
>>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>>> index b0881a0ed322..1713977ee26f 100644
>>>>> --- a/include/linux/syscalls.h
>>>>> +++ b/include/linux/syscalls.h
>>>>> @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>>>>> asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
>>>>> unsigned long idx1, unsigned long idx2);
>>>>> asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
>>>>> +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>>>>> + const char __user *uargs);
>>>>
>>>> It looks odd to add 'flags' argument to syscall that is not even used.
>>>> It don't think it will be extensible this way.
>>>> 'uargs' is used only in 2nd command as well and it's not 'char __user *'
>>>> but rather 'struct sock_fprog __user *'
>>>> I think it makes more sense to define only first argument as 'int op' and the
>>>> rest as variable length array.
>>>> Something like:
>>>> long sys_seccomp(unsigned int op, struct nlattr *attrs, int len);
>>>> then different commands can interpret 'attrs' differently.
>>>> if op == mode_strict, then attrs == NULL, len == 0
>>>> if op == mode_filter, then attrs->nla_type == seccomp_bpf_filter
>>>> and nla_data(attrs) is 'struct sock_fprog'
>>>
>>> Eww. If the operation doesn't imply the type, then I think we've
>>> totally screwed up.
>>>
>>>> If we decide to add new types of filters or new commands, the syscall prototype
>>>> won't need to change. New commands can be added preserving backward
>>>> compatibility.
>>>> The basic TLV concept has been around forever in netlink world. imo makes
>>>> sense to use it with new syscalls. Passing 'struct xxx' into syscalls
>>>> is the thing
>>>> of the past. TLV style is more extensible. Fields of structures can become
>>>> optional in the future, new fields added, etc.
>>>> 'struct nlattr' brings the same benefits to kernel api as protobuf did
>>>> to user land.
>>>
>>> I see no reason to bring nl_attr into this.
>>>
>>> Admittedly, I've never dealt with nl_attr, but everything
>>> netlink-related I've even been involved in has involved some sort of
>>> API atrocity.
>>
>> netlink has a lot of legacy and there is genetlink which is not pretty
>> either because of extra socket creation, binding, dealing with packet
>> loss issues, but the key concept of variable length encoding is sound.
>> Right now seccomp has two commands and they already don't fit
>> into single syscall neatly. Are you saying there should be two syscalls
>> here? What about another seccomp related command? Another syscall?
>> imo all seccomp related commands needs to be mux/demux-ed under
>> one syscall. What is the way to mux/demux potentially very different
>> commands under one syscall? I cannot think of anything better than
>> TLV style. 'struct nlattr' is what we have today and I think it works fine.
>> I'm not suggesting to bring the whole netlink into the picture, but rather
>> TLV style of encoding different arguments for different commands.
>
> I'm unconvinced. These are simple commands, and I think the interface
> should be simple. Syscalls are cheap.
>
> As an example, the interface could be:
>
> int seccomp_add_filter(const struct sock_fprog *filter, unsigned int flags);
>
> The "tsync" operation would be seccomp_add_filter(NULL,
> SECCOMP_ADD_FILTER_TSYNC) -- it's equivalent to adding an
> always-accept filter and syncing threads.

I think you convinced me that tsync should be part of adding a filter
(since now there are no failure side-effects), so this specific
example I would expect EFAULT from. But ...

>
> But, frankly, this kind of stuff should probably be "do operation X".
> IIUC nl_attr is more like "do something, with these tags and values",
> which results in oddities like whatever should happen of more than one
> tag is set.

I have no objection to eliminating seccomp-strict from the syscall,
and just making this the "add seccomp filter" syscall. My only
hesitation would be that if we need something besides adding a filter
in the future, we'd be back to extending this awkwardly or adding
another syscall. That's why I went with the "operation" argument. I'm
not opposed to passing attrs and len, but seccomp_add_filter does feel
cleaner.

-Kees

--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/