From: Andy Lutomirski
Date: Mon Jun 02 2014 - 16:54:23 EST

On Tue, May 27, 2014 at 12:55 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Tue, May 27, 2014 at 12:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Tue, May 27, 2014 at 12:23 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>> On Tue, May 27, 2014 at 12:10 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> On Tue, May 27, 2014 at 11:45 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>>> On Tue, May 27, 2014 at 11:40 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>> On Tue, May 27, 2014 at 11:24 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>>>>> On Mon, May 26, 2014 at 12:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>>>> On Fri, May 23, 2014 at 10:05 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>>>>>>> On Thu, May 22, 2014 at 4:11 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>>>>>>> On Thu, May 22, 2014 at 4:05 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>>>>>>>>>> Applying restrictive seccomp filter programs to large or diverse
>>>>>>>>>>> codebases often requires handling threads which may be started early in
>>>>>>>>>>> the process lifetime (e.g., by code that is linked in). While it is
>>>>>>>>>>> possible to apply permissive programs prior to process start up, it is
>>>>>>>>>>> difficult to further restrict the kernel ABI to those threads after that
>>>>>>>>>>> point.
>>>>>>>>>>> This change adds a new seccomp extension action for synchronizing thread
>>>>>>>>>>> group seccomp filters and a prctl() for accessing that functionality,
>>>>>>>>>>> as well as a flag for SECCOMP_EXT_ACT_FILTER to perform sync at filter
>>>>>>>>>>> installation time.
>>>>>>>>>>> flags, filter) with flags containing SECCOMP_FILTER_TSYNC, or when calling
>>>>>>>>>>> will attempt to synchronize all threads in current's threadgroup to its
>>>>>>>>>>> seccomp filter program. This is possible iff all threads are using a filter
>>>>>>>>>>> that is an ancestor to the filter current is attempting to synchronize to.
>>>>>>>>>>> NULL filters (where the task is running as SECCOMP_MODE_NONE) are also
>>>>>>>>>>> treated as ancestors allowing threads to be transitioned into
>>>>>>>>>>> SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the
>>>>>>>>>>> calling thread, no_new_privs will be set for all synchronized threads too.
>>>>>>>>>>> On success, 0 is returned. On failure, the pid of one of the failing threads
>>>>>>>>>>> will be returned, with as many filters installed as possible.
>>>>>>>>>> Is there a use case for adding a filter and synchronizing filters
>>>>>>>>>> being separate operations? If not, I think this would be easier to
>>>>>>>>>> understand and to use if there was just a single operation.
>>>>>>>>> Yes: if the other thread's lifetime is not well controlled, it's good
>>>>>>>>> to be able to have a distinct interface to retry the thread sync that
>>>>>>>>> doesn't require adding "no-op" filters.
>>>>>>>> Wouldn't this still be solved by:
>>>>>>>> seccomp_add_filter(final_filter, SECCOMP_FILTER_ALL_THREADS);
>>>>>>>> the idea would be that, if seccomp_add_filter fails, then you give up
>>>>>>>> and, if it succeeds, then you're done. It shouldn't fail unless out
>>>>>>>> of memory or you've nested too deeply.
>>>>>>> I wanted to keep the case of being able to to wait for non-ancestor
>>>>>>> threads to finish. For example, 2 threads start and set separate
>>>>>>> filters. 1 does work and exits, 2 starts another thread (3) which adds
>>>>>>> filters, does work, and then waits for 1 to finish by calling TSYNC.
>>>>>>> Once 1 dies, TSYNC succeeds. In the case of not having direct control
>>>>>>> over thread lifetime (say, when using third-party libraries), I'd like
>>>>>>> to retain the flexibility of being able to do TSYNC without needing a
>>>>>>> filter being attached to it.
>>>>>> I must admit this strikes me as odd. What's the point of having a
>>>>>> thread set a filter if it intends to be a short-lived thread?
>>>>> I was illustrating the potential insanity of third-party libraries.
>>>>> There isn't much sense in that behavior, but if it exists, working
>>>>> around it is harder without the separate TSYNC-only call.
>>>>>> In any case, I must have missed the ability for TSYNC to block. Hmm.
>>>>>> That seems complicated, albeit potentially useful.
>>>>> Oh, no, I didn't mean to imply TSYNC should block. I meant that thread
>>>>> 3 could do:
>>>>> while (TSYNC-fails)
>>>>> wait-on-or-kill-unexpected-thread
>>>> Ok.
>>>> I'm still not seeing the need for a separate TSYNC option, though --
>>>> just add-a-filter-across-all-threads would work if it failed
>>>> harmlessly on error. FWIW, TSYNC is probably equivalent to adding an
>>>> always-accept filter across all threads, although no one should really
>>>> do the latter for efficiency reasons.
>>> Given the complexity of the locking, "fail" means "I applied the
>>> change to all threads except for at least this one: *error code*",
>>> which means looping with the "add-a-filter" method means all the other
>>> threads keep getting filters added until there is full success. I
>>> don't want that overhead, so this keeps TSYNC distinctly separate.
>> Ugh, right.
>>> Because of the filter addition, when using add_filter-TSYNC, it's not
>>> sensible to continue after a failure. However, using just-TSYNC allows
>>> sensible re-trying. Since the environments where TSYNC intend to be
>>> used in can be very weird, I really want to retain the retry ability.
>> OK. So what's wrong with the other approach in which the
>> add-to-all-threads option always succeeds? IOW, rather than requiring
>> that all threads have the caller's filter as an ancestor, just add the
>> requested filter to all threads. As an optimization, if the targetted
>> thread has the current thread's filter as its filter, too, then the
>> targetted thread can stay synchronized.
>> That way the add filter call really is atomic.
>> I'm not fundamentally opposed to TSYNC, but I think I'd be happier if
>> the userspace interface could be kept as simple as possible. The fact
>> that there's a filter hierarchy is sort of an implementation detail, I
>> think.
> I'm totally on board with making this as simple as possible. :) The
> corner cases are kind of horrible, though, but I think this is already
> as simple as it can get.
> Externally, without the ancestry check, we can run the risk of have
> unstable behavior out of a filter change. Imagine the case of a race
> where a thread is adding a filter (via prctl), and the other thread
> attempts to TSYNC a filter that blocks prctl.
> In the "always take the new filter" case, sometimes we get two filters
> (original and TSYNCed) on the first thread, and sometimes it blows up
> when it calls prctl (TSYNCed filter blocks the prctl). There's no way
> for the TSYNC caller to detect who won the race.
> With the patch series as-is, losing the race results in a TSYNC
> failure (ancestry doesn't match). This is immediately detectable and
> the caller can the decide how to handle the situation directly.
> Regardless, I don't think the filter hierarchy is an implementation
> detail -- complex filters take advantage of the hierarchies. And
> keeping this hierarchy stable means the filters are simpler to
> validate for correctness, etc.


Another issue: unless I'm not looking at the right version of this, I
don't think that while_each_thread does what you think it does.

I'm generally in favor of making the kernel interfaces easy to use,
even at the cost of a bit of implementation complexity. Would this be
simpler if you hijacked siglock to protect seccomp state? Then I
think you could just make everything atomic.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at