Re: [PATCH 3/6] shm: add memfd_create() syscall

From: Konstantin Khlebnikov
Date: Wed Apr 02 2014 - 10:52:29 EST

On Wed, Apr 2, 2014 at 6:18 PM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> Hi
> On Wed, Apr 2, 2014 at 3:38 PM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote:
>> On Wed, Mar 19, 2014 at 11:06 PM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
>>> memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
>>> that you can pass to mmap(). It explicitly allows sealing and
>>> avoids any connection to user-visible mount-points. Thus, it's not
>>> subject to quotas on mounted file-systems, but can be used like
>>> malloc()'ed memory, but with a file-descriptor to it.
>>> memfd_create() does not create a front-FD, but instead returns the raw
>>> shmem file, so calls like ftruncate() can be used. Also calls like fstat()
>>> will return proper information and mark the file as regular file. Sealing
>>> is explicitly supported on memfds.
>>> Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
>>> subject to quotas and alike.
>> Instead of adding new syscall we can extend existing openat() a little
>> bit more:
>> openat(AT_FDSHM, "name", O_TMPFILE | O_RDWR, 0666)
> O_TMPFILE requires an existing directory as "name". So you have to use:
> open("/run/", O_TMPFILE | O_RDWR, 0666)
> instead of
> open("/run/new_file", O_TMPFILE | O_RDWR, 0666)
> We _really_ want to set a name for the inode, though. Otherwise,
> debug-info via /proc/pid/fd/ is useless.
> Furthermore, Linus requested to allow sealing only on files that
> _explicitly_ allow sealing. So v2 of this series will have
> MFD_ALLOW_SEALING as memfd_create() flag. I don't think we can do this
> with linkat() (or is that meant to be implicit for the new AT_FDSHM?).
> Last but not least, you now need a separate syscall to set the
> file-size.
> I could live with most of these issues, except for the name-thing. Ideas?

Hmm, why AT_FDSHM + O_TMPFILE pair cannot has different naming behavior?
Actually O_TMPFILE flag is optional here. AT_FDSHM is enough, but
O_TMPFILE allows to
move branching out of common fast-paths and hide it inside do_tmpfile.

BTW you can set some extended attribute via fsetxattr and distinguish
files in proc by its value.

OR you could add fcntl() for changing 'name' of tmpfiles. In
combination with AT_FDSHM this
would give complete solution without changing O_TMPFILE naming scheme.
But one syscall turns into three. )

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at