Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

From: Will Drewry
Date: Thu May 26 2011 - 12:33:27 EST


On Thu, May 26, 2011 at 10:03 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, May 26, 2011 at 7:37 AM, Colin Walters <walters@xxxxxxxxxx> wrote:
>>
>> I'm curious which features you feel are esoteric and cool but unused?
>
> Just about anything linux-specific. Ranging from the totally new
> concepts (epoll/clone/splice/signalfd) to just simple cleanups and
> extensions of reasonably standard stuff (sync_file_range/sendpage).
>
> Sure, there's almost always *somebody* who uses them, but they are
> seldom actually worth it.
>
> The one thing that works well is when you expose it as a standard
> interface. So futexes are linux-specific, but they are exposed as the
> standard pthreads condition variables etc to apps - very few actually
> use them as futexes. But because glibc uses them for the pthreads
> synchronization, I think they ended up being used inside glibc for
> low-level stuff too, so I think futexes ended up being an unqualified
> success - much better than the standard interface.
>
> The "it can be used in standard libraries" ends up being a very
> powerful thing. It doesn't have to be libc - if something like a glib
> or a big graphical interface uses them, they can get very popular. But
> if you have to have actual config options (autoconf or similar) to
> enable the feature on Linux, along with a compatibility case (because
> older kernels don't even support it, so it's not even "linux", it's
> "linux newer than xyz"), then very very few applications end up using
> it.
>
> And security issues in particular are often *very* subtle. For
> example, something like a system call filter sounds like an obviously
> safe thing: it can only limit what you do, right?
>
> Except no, not right at all. Imagine that you're limiting a suid
> application, and the one operation you limit is "setuid()". Imagine
> that the suid application explicitly drops privileges in order to run
> safely as the user. Imagine, further, that it doesn't even check the
> return value, because it *knows* that if it is root, it will succeed,
> and if it isn't root, then it wasn't suid to begin with and doesn't
> need to do anything about it.
>
> Unlikely? Hell no. That's standard practice. And if you allow filter
> setup that survives fork+exec, you just opened a HUGE security hole.
>
> Fixable? Yes, easily. And I haven't looked at the current patches, but
> I would not be AT ALL surprised if they had exactly the above huge
> security hole.

FWIW, none of the patches deal with privilege escalation via setuid
files or file capabilities.

> My point being that (a) I'm very dubious about new non-standard
> features, because historically they seldom get used very widely and
> (b) I'm doubly dubious about security things because it turns out it's
> damn easy to get it wrong in all kinds of small subtle details.

I agree with both points, so I'm being a bit hypocritical, I suspect.

At present, I'm not aware of any platforms that support system call
restriction in a non-platform-specific fashion: mac has seatbelt,
freebsd has things like capsicum, linux has seccomp :) This led me to
the proposal around expanding seccomp since it was already a Linux-ism
for this functionality and, ideally, could be minimal to help limit
the subtle-bug-exposure. However, any form of kernel attack surface
reduction would be great, but I'm unaware of any that integrate with
glibc smoothly (even if some do have nicer programming interfaces).

I've used system call filtering in the past with good effect in server
environments, and I believe the Chromium renderer example is a robust
for Linux desktops, even without glibc integration. If the
Linux-specific, non-automatic (glibc) interface is a no-go, then I'll
go back to the drawing board. I'm not sure how to avoid something
Linux-specific in general, even if it's just adding syscall hooks to
LSMs, though it could be possible to share interfaces with some other
platform's implementation of a broader security system that includes
kernel exposure minimization (like capsicum) which could be built-on
what existing substrate is available or a new one, as Ingo proposes.

thanks!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/