Re: [RFC] x86: restrict pid namespaces to 32 or 64 bit syscalls

From: Vasiliy Kulikov
Date: Sun Aug 14 2011 - 12:13:16 EST

(CC'ed Will Drewry, the author of new seccomp version, and
containers list)

On Sun, Aug 14, 2011 at 17:27 +0200, Andi Kleen wrote:
> > i386 vs x86-64 vs x32 is just one of many axes along which syscalls can be restricted (and for that matter, one axis if backward compatibility), and it does not make sense to burden the code with ad hoc filters. Designing a general filter facility which can be used to restrict any container to the subset of system calls it actually needs would make more sense, no?
> I believe this is already in the newer versions of seccomp.

The "newer versions of seccomp" are NAK'ed by Ingo. AFAIU, Ingo wants
more generic filters to filter much more than syscalls. But it
contradicts the security by simplicity, which we're trying to achieve
with this patch.

Compatibility syscalls are much more error prone than common syscalls
as they lack good testing or sometimes lack it at all, unfortunately.
The link I've posted is about a crazy bug - a completely uninitialized
structure was used in copy_from_user() function. The function was not
tested _at all_. I doubt any non-compatibility syscall (ioctl()
handler, etc.) can be completely untested.

Also we already have CONFIG_IA32_EMULATION, this patch only moves the
configuration mechanism from the compilation stage to the runtime stage,
it doesn't draw the new line. It grants the permissions to use the
feature to some containers, but denies to other containers, which is an
rather expected property of containers separation.


Vasiliy Kulikov - bringing security into open computing environments
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at