[PATCH 0/6] File Sealing & memfd_create()

From: David Herrmann
Date: Wed Mar 19 2014 - 15:08:26 EST


This series introduces the concept of "file sealing". Sealing a file restricts
the set of allowed operations on the file in question. Multiple seals are
defined and each seal will cause a different set of operations to return EPERM
if it is set. The following seals are introduced:

* SEAL_SHRINK: If set, the inode size cannot be reduced
* SEAL_GROW: If set, the inode size cannot be increased
* SEAL_WRITE: If set, the file content cannot be modified

Unlike existing techniques that provide similar protection, sealing allows
file-sharing without any trust-relationship. This is enforced by rejecting seal
modifications if you don't own an exclusive reference to the given file. So if
you own a file-descriptor, you can be sure that no-one besides you can modify
the seals on the given file. This allows mapping shared files from untrusted
parties without the fear of the file getting truncated or modified by an

Several use-cases exist that could make great use of sealing:

1) Graphics Compositors
If a graphics client creates a memory-backed render-buffer and passes a
file-decsriptor to it to the graphics server for display, the server
_has_ to setup SIGBUS handlers whenever mapping the given file. Otherwise,
the client might run ftruncate() or O_TRUNC on the on file in parallel,
thus crashing the server.
With sealing, a compositor can reject any incoming file-descriptor that
does _not_ have SEAL_SHRINK set. This way, any memory-mappings are
guaranteed to stay accessible. Furthermore, we still allow clients to
increase the buffer-size in case they want to resize the render-buffer for
the next frame. We also allow parallel writes so the client can render new
frames into the same buffer (client is responsible of never rendering into
a front-buffer if you want to avoid artifacts).

Real use-case: Wayland wl_shm buffers can be transparently converted

2) Geneal-purpose IPC
IPC mechanisms that do not require a mutual trust-relationship (like dbus)
cannot do zero-copy so far. With sealing, zero-copy can be easily done by
sharing a file-descriptor that has SEAL_SHRINK | SEAL_GROW | SEAL_WRITE
set. This way, the source can store sensible data in the file, seal the
file and then pass it to the destination. The destination verifies these
seals are set and then can parse the message in-line.
Note that these files are usually one-shot files. Without any
trust-relationship, a destination can notify the source that it released a
file again, but a source can never rely on it. So unless the destination
releases the file, a source cannot clear the seals for modification again.
However, this is inherent to situations without any trust-relationship.

Real use-case: kdbus messages already use a similar interface and can be
transparently converted to use these seals

Other similar use-cases exist (eg., audio), but these two I am personally
working on. Interest in this interface has been raised from several other camps
and I've put respective maintainers into CC. If more information on these
use-cases is needed, I think they can give some insights.

The API introduced by this patchset is:

* fcntl() extension:
Two new fcntl() commands are added that allow retrieveing (SHMEM_GET_SEALS)
and setting (SHMEM_SET_SEALS) seals on a file. Only shmfs implements them so
far and there is no intention to implement them on other file-systems.
All shmfs based files support sealing.

Patch 2/6

* memfd_create() syscall:
The new memfd_create() syscall is a public frontend to the shmem_file_new()
interface in the kernel. It avoids the need of a local shmfs mount-point (as
requested by android people) and acts more like MAP_ANON than O_TMPFILE.

Patch 3/6

The other 4 patches are cleanups, self-tests and docs.

The commit-messages explain the API extensions in detail. Man-page proposals
are also provided. Last but not least, the extensive self-tests document the
intended behavior, in case it is still not clear.

Technically, sealing and memfd_create() are independent, but the described
use-cases would greatly benefit from the combination of both. Hence, I merged
them into the same series. Please also note that this series is based on earlier
works (ashmem, memfd, shmgetfd, ..) and unifies these attempts.

Comments welcome!


David Herrmann (4):
fs: fix i_writecount on shmem and friends
shm: add sealing API
shm: add memfd_create() syscall
selftests: add memfd_create() + sealing tests

David Herrmann (2): (man-pages)
fcntl.2: document SHMEM_SET/GET_SEALS commands
memfd_create.2: add memfd_create() man-page

arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 1 +
fs/fcntl.c | 12 +-
fs/file_table.c | 27 +-
include/linux/shmem_fs.h | 17 +
include/linux/syscalls.h | 1 +
include/uapi/linux/fcntl.h | 13 +
include/uapi/linux/memfd.h | 9 +
kernel/sys_ni.c | 1 +
mm/shmem.c | 267 +++++++-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/memfd/.gitignore | 2 +
tools/testing/selftests/memfd/Makefile | 29 +
tools/testing/selftests/memfd/memfd_test.c | 972 +++++++++++++++++++++++++++++
14 files changed, 1338 insertions(+), 15 deletions(-)
create mode 100644 include/uapi/linux/memfd.h
create mode 100644 tools/testing/selftests/memfd/.gitignore
create mode 100644 tools/testing/selftests/memfd/Makefile
create mode 100644 tools/testing/selftests/memfd/memfd_test.c


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/