Re: [PATCH v18 0/9] mm: introduce memfd_secret system call to create "secret" memory areas
From: James Bottomley
Date: Thu May 06 2021 - 13:06:34 EST
On Thu, 2021-05-06 at 18:45 +0200, David Hildenbrand wrote:
> On 06.05.21 17:26, James Bottomley wrote:
> > On Wed, 2021-05-05 at 12:08 -0700, Andrew Morton wrote:
> > > On Wed, 3 Mar 2021 18:22:00 +0200 Mike Rapoport <rppt@xxxxxxxxxx
> > > >
> > > wrote:
> > >
> > > > This is an implementation of "secret" mappings backed by a file
> > > > descriptor.
> > > >
> > > > The file descriptor backing secret memory mappings is created
> > > > using a dedicated memfd_secret system call The desired
> > > > protection mode for the memory is configured using flags
> > > > parameter of the system call. The mmap() of the file descriptor
> > > > created with memfd_secret() will create a "secret" memory
> > > > mapping. The pages in that mapping will be marked as not
> > > > present in the direct map and will be present only in the page
> > > > table of the owning mm.
> > > >
> > > > Although normally Linux userspace mappings are protected from
> > > > other users, such secret mappings are useful for environments
> > > > where a hostile tenant is trying to trick the kernel into
> > > > giving them access to other tenants mappings.
> > >
> > > I continue to struggle with this and I don't recall seeing much
> > > enthusiasm from others. Perhaps we're all missing the value
> > > point and some additional selling is needed.
> > >
> > > Am I correct in understanding that the overall direction here is
> > > to protect keys (and perhaps other things) from kernel
> > > bugs? That if the kernel was bug-free then there would be no
> > > need for this feature? If so, that's a bit sad. But realistic I
> > > guess.
> > Secret memory really serves several purposes. The "increase the
> > level of difficulty of secret exfiltration" you describe. And, as
> > you say, if the kernel were bug free this wouldn't be necessary.
> > But also:
> > 1. Memory safety for use space code. Once the secret memory is
> > allocated, the user can't accidentally pass it into the
> > kernel to be
> > transmitted somewhere.
> That's an interesting point I didn't realize so far.
> > 2. It also serves as a basis for context protection of virtual
> > machines, but other groups are working on this aspect, and
> > it is
> > broadly similar to the secret exfiltration from the kernel
> > problem.
> I was wondering if this also helps against CPU microcode issues like
> spectre and friends.
It can for VMs, but not really for the user space secret memory use
cases ... the in-kernel mitigations already present are much more
> > > Is this intended to protect keys/etc after the attacker has
> > > gained the ability to run arbitrary kernel-mode code? If so,
> > > that seems optimistic, doesn't it?
> > Not exactly: there are many types of kernel attack, but mostly the
> > attacker either manages to effect a privilege escalation to root or
> > gets the ability to run a ROP gadget. The object of this code is
> > to be completely secure against root trying to extract the secret
> > (some what similar to the lockdown idea), thus defeating privilege
> > escalation and to provide "sufficient" protection against ROP
> > gadget.
> What stops "root" from mapping /dev/mem and reading that memory?
/dev/mem uses the direct map for the copy at least for read/write, so
it gets a fault in the same way root trying to use ptrace does. I
think we've protected mmap, but Mike would know that better than I.
> IOW, would we want to enforce "CONFIG_STRICT_DEVMEM" with
Unless there's a corner case I haven't thought of, I don't think it
adds much. However, doing a full lockdown on a public system where
users want to use secret memory is best practice I think (except I
think you want it to be the full secure boot lockdown to close all the
> Also, there is a way to still read that memory when root by
> 1. Having kdump active (which would often be the case, but maybe not
> to dump user pages )
> 2. Triggering a kernel crash (easy via proc as root)
> 3. Waiting for the reboot after kump() created the dump and then
> reading the content from disk.
Anything that can leave physical memory intact but boot to a kernel
where the missing direct map entry is restored could theoretically
extract the secret. However, it's not exactly going to be a stealthy
> Or, as an attacker, load a custom kexec() kernel and read memory
> from the new environment. Of course, the latter two are advanced
> mechanisms, but they are possible when root. We might be able to
> mitigate, for example, by zeroing out secretmem pages before booting
> into the kexec kernel, if we care :)
I think we could handle it by marking the region, yes, and a zero on
shutdown might be useful ... it would prevent all warm reboot type