Re: [PATCH 10/17] prmem: documentation

From: Igor Stoppa
Date: Tue Oct 30 2018 - 16:43:29 EST

On 30/10/2018 21:20, Matthew Wilcox wrote:
On Tue, Oct 30, 2018 at 12:28:41PM -0600, Tycho Andersen wrote:
On Tue, Oct 30, 2018 at 10:58:14AM -0700, Matthew Wilcox wrote:
On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote:
On Oct 30, 2018, at 9:37 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
I support the addition of a rare-write mechanism to the upstream kernel.
And I think that there is only one sane way to implement it: using an
mm_struct. That mm_struct, just like any sane mm_struct, should only
differ from init_mm in that it has extra mappings in the *user* region.

I'd like to understand this approach a little better. In a syscall path,
we run with the user task's mm. What you're proposing is that when we
want to modify rare data, we switch to rare_mm which contains a
writable mapping to all the kernel data which is rare-write.

So the API might look something like this:

void *p = rare_alloc(...); /* writable pointer */
p->a = x;
q = rare_protect(p); /* read-only pointer */

With pools and memory allocated from vmap_areas, I was able to say


and that would do a swipe on all the pages currently in use.
In the SELinux policyDB, for example, one doesn't really want to individually protect each allocation.

The loading phase happens usually at boot, when the system can be assumed to be sane (one might even preload a bare-bone set of rules from initramfs and then replace it later on, with the full blown set).

There is no need to process each of these tens of thousands allocations and initialization as write-rare.

Would it be possible to do the same here?

To subsequently modify q,

p = rare_modify(q);
q->a = y;

Do you mean

p->a = y;

here? I assume the intent is that q isn't writable ever, but that's
the one we have in the structure at rest.

Yes, that was my intent, thanks.

To handle the list case that Igor has pointed out, you might want to
do something like this:

list_for_each_entry(x, &xs, entry) {
struct foo *writable = rare_modify(entry);

Would this mapping be impossible to spoof by other cores?

I'm asking this because, from what I understand, local interrupts are enabled here, so an attack could freeze the core performing the write-rare operation, while another scrapes the memory.

But blocking interrupts for the entire body of the loop would make RT latency unpredictable.


but we'd probably wrap it in list_for_each_rare_entry(), just to be nicer.

This seems suspiciously close to the duplication of kernel interfaces that I was roasted for :-)