Re: For review: seccomp_user_notif(2) manual page

From: Michael Kerrisk (man-pages)
Date: Wed Oct 14 2020 - 05:27:34 EST


On 10/1/20 7:12 PM, Christian Brauner wrote:
> On Thu, Oct 01, 2020 at 10:58:50AM -0600, Tycho Andersen wrote:
>> On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote:
>>> On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
>>> <christian.brauner@xxxxxxxxxxxxx> wrote:
>>>> On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
>>>>> On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
>>>>> <mtk.manpages@xxxxxxxxx> wrote:
>>>>>> NOTES
>>>>>> The file descriptor returned when seccomp(2) is employed with the
>>>>>> SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using
>>>>>> poll(2), epoll(7), and select(2). When a notification is pend‐
>>>>>> ing, these interfaces indicate that the file descriptor is read‐
>>>>>> able.
>>>>>
>>>>> We should probably also point out somewhere that, as
>>>>> include/uapi/linux/seccomp.h says:
>>>>>
>>>>> * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
>>>>> * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
>>>>> * same syscall, the most recently added filter takes precedence. This means
>>>>> * that the new SECCOMP_RET_USER_NOTIF filter can override any
>>>>> * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
>>>>> * such filtered syscalls to be executed by sending the response
>>>>> * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
>>>>> * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
>>>>>
>>>>> In other words, from a security perspective, you must assume that the
>>>>> target process can bypass any SECCOMP_RET_USER_NOTIF (or
>>>>> SECCOMP_RET_TRACE) filters unless it is completely prohibited from
>>>>> calling seccomp(). This should also be noted over in the main
>>>>> seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
>>>>
>>>> So I was actually wondering about this when I skimmed this and a while
>>>> ago but forgot about this again... Afaict, you can only ever load a
>>>> single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
>>>> already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
>>>> in the tasks filter hierarchy then the kernel will refuse to load a new
>>>> one?
>>>>
>>>> static struct file *init_listener(struct seccomp_filter *filter)
>>>> {
>>>> struct file *ret = ERR_PTR(-EBUSY);
>>>> struct seccomp_filter *cur;
>>>>
>>>> for (cur = current->seccomp.filter; cur; cur = cur->prev) {
>>>> if (cur->notif)
>>>> goto out;
>>>> }
>>>>
>>>> shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
>>>> override each other for the same task simply because there can only ever
>>>> be a single one?
>>>
>>> Good point. Exceeeept that that check seems ineffective because this
>>> happens before we take the locks that guard against TSYNC, and also
>>> before we decide to which existing filter we want to chain the new
>>> filter. So if two threads race with TSYNC, I think they'll be able to
>>> chain two filters with listeners together.
>>
>> Yep, seems the check needs to also be in seccomp_can_sync_threads() to
>> be totally effective,
>>
>>> I don't know whether we want to eternalize this "only one listener
>>> across all the filters" restriction in the manpage though, or whether
>>> the man page should just say that the kernel currently doesn't support
>>> it but that security-wise you should assume that it might at some
>>> point.
>>
>> This requirement originally came from Andy, arguing that the semantics
>> of this were/are confusing, which still makes sense to me. Perhaps we
>> should do something like the below?
>
> I think we should either keep up this restriction and then cement it in
> the manpage or add a flag to indicate that the notifier is
> non-overridable.
> I don't care about the default too much, i.e. whether it's overridable
> by default and exclusive if opting in or the other way around doesn't
> matter too much. But from a supervisor's perspective it'd be quite nice
> to be able to be sure that a notifier can't be overriden by another
> notifier.
>
> I think having a flag would provide the greatest flexibility but I agree
> that the semantics of multiple listeners are kinda odd.

So, for now, I have applied the patch at the foot of this mail
to the pages. Does this seem correct?

> Below looks sane to me though again, I'm not sitting in fron of source
> code.
[...]

Thanks,

Michael

PS Jann, if you see this, I'm still working through your (extensive
and very helpful) review comments. I will be sending a response.

======

diff --git a/man2/seccomp.2 b/man2/seccomp.2
index 9ab07f4ab..45a6984df 100644
--- a/man2/seccomp.2
+++ b/man2/seccomp.2
@@ -221,6 +221,11 @@ return a new user-space notification file descriptor.
When the filter returns
.BR SECCOMP_RET_USER_NOTIF
a notification will be sent to this file descriptor.
+.IP
+At most one seccomp filter using the
+.BR SECCOMP_FILTER_FLAG_NEW_LISTENER
+flag can be installed for a thread.
+.IP
See
.BR seccomp_user_notif (2)
for further details.
@@ -789,6 +794,12 @@ capability in its user namespace, or had not set
before using
.BR SECCOMP_SET_MODE_FILTER .
.TP
+.BR EBUSY
+While installing a new filter, the
+.BR SECCOMP_FILTER_FLAG_NEW_LISTENER
+flag was specified,
+but a previous filter had already been installed with that flag.
+.TP
.BR EFAULT
.IR args
was not a valid address.
diff --git a/man2/seccomp_user_notif.2 b/man2/seccomp_user_notif.2
index a6025e4d4..d1a406f46 100644
--- a/man2/seccomp_user_notif.2
+++ b/man2/seccomp_user_notif.2
@@ -92,6 +92,7 @@ Consequently, the return value of the (successful)
.BR seccomp (2)
call is a new "listening"
file descriptor that can be used to receive notifications.
+Only one such "listener" can be established.
.IP \(bu
In cases where it is appropriate, the seccomp filter returns the action value
.BR SECCOMP_RET_USER_NOTIF .

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/