[RFC PATCH] mm: extend memfd with ability to create "secret" memory areas

From: Mike Rapoport
Date: Thu Jan 30 2020 - 11:23:53 EST


Hi,

This is essentially a resend of my attempt to implement "secret" mappings
using a file descriptor [1].

I've done a couple of experiments with secret/exclusive/whatever
memory backed by a file-descriptor using a chardev and memfd_create
syscall. There is indeed no need for VM_ flag, but there are still places
that would require special care, e.g vm_normal_page(), madvise(DO_FORK), so
it won't be completely free of core mm modifications.

Below is a POC that implements extension to memfd_create() that allows
mapping of a "secret" memory. The "secrecy" mode should be explicitly set
using ioctl(), for now I've implemented exclusive and uncached mappings.

The POC primarily indented to illustrate a possible userspace API for
fd-based secret memory. The idea is that user will create a file
descriptor using a system call. The user than has to use ioctl() to define
the desired mode of operation and only when the mode is set it is possible
to mmap() the memory. I.e something like

fd = memfd_create("secret", MFD_SECRET);
ioctl(fd, MFD_SECRET_UNCACHED);
ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
fd, 0);


The ioctl() allows a lot of flexibility in how the secrecy should be
defined. It could be either a request for a particular protection (e.g.
exclusive, uncached) or something like "secrecy level" from "a bit more
secret than normally" to "do your best even at the expense of performance".
The POC implements the first option and the modes are mutually exclusive
for now, but there is no fundamental reason they cannot be mixed.

I've chosen memfd over a chardev as it seem to play more neatly with
anon_inodes and would allow simple (ab)use of the page cache for tracking
pages allocated for the "secret" mappings as well as using
address_space_operations for e.g. page migration callbacks.

The POC implementation uses set_memory/pageattr APIs to manipulate the
direct map and does not address the direct map fragmentation issue.

Of course this is something that must be addressed, as well as
modifications to core mm to required keep the secret memory secret, but I'd
really like to focus on the userspace ABI first.

[1] https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@xxxxxxxxxx/
[1] https://lore.kernel.org/lkml/20191205153400.GA25575@rapoport-lnx/