Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings

From: Matthew Wilcox
Date: Mon Oct 28 2019 - 14:08:13 EST

On Mon, Oct 28, 2019 at 10:12:44AM -0700, Dave Hansen wrote:
> Some other random thoughts:
> * The page flag is probably not a good idea. It would be probably
> better to set _PAGE_SPECIAL on the PTE and force get_user_pages()
> into the slow path.
> * This really stops being "normal" memory. You can't do futexes on it,
> cant splice it. Probably need a more fleshed-out list of
> incompatible features.
> * As Kirill noted, each 4k page ends up with a potential 1GB "blast
> radius" of demoted pages in the direct map. Not cool. This is
> probably a non-starter as it stands.
> * The global TLB flushes are going to eat you alive. They probably
> border on a DoS on larger systems.
> * Do we really want this user interface to dictate the kernel
> implementation? In other words, do we really want MAP_EXCLUSIVE,
> or do we want MAP_SECRET? One tells the kernel what do *do*, the
> other tells the kernel what the memory *IS*.
> * There's a lot of other stuff going on in this area: XPFO, SEV, MKTME,
> Persistent Memory, where the kernel direct map is a liability in some
> way. We probably need some kind of overall, architected solution
> rather than five or ten things all poking at the direct map.

Another random set of thoughts:

- Should devices be permitted to DMA to/from MAP_SECRET pages?
- How about GUP? Can I ptrace my way into another process's secret pages?
- What if I splice() the page into a pipe?