Re: [PATCH 10/17] prmem: documentation
From: Andy Lutomirski
Date: Tue Oct 30 2018 - 14:51:23 EST
> On Oct 30, 2018, at 10:58 AM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> On Tue, Oct 30, 2018 at 10:06:51AM -0700, Andy Lutomirski wrote:
>>> On Oct 30, 2018, at 9:37 AM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>> I support the addition of a rare-write mechanism to the upstream kernel.
>> And I think that there is only one sane way to implement it: using an
>> mm_struct. That mm_struct, just like any sane mm_struct, should only
>> differ from init_mm in that it has extra mappings in the *user* region.
> I'd like to understand this approach a little better. In a syscall path,
> we run with the user task's mm. What you're proposing is that when we
> want to modify rare data, we switch to rare_mm which contains a
> writable mapping to all the kernel data which is rare-write.
> So the API might look something like this:
> void *p = rare_alloc(...); /* writable pointer */
> p->a = x;
> q = rare_protect(p); /* read-only pointer */
> To subsequently modify q,
> p = rare_modify(q);
> q->a = y;
Or, for big writes:
This avoids a whole ton of issues. In practice, actually running with a special mm requires preemption disabled as well as some other stuff, which Nadav carefully dealt with.
Also, can we maybe focus on getting something merged for statically allocated data first?
Finally, one issue: rare_alloc() is going to utterly suck performance-wise due to the global IPI when the region gets zapped out of the direct map or otherwise made RO. This is the same issue that makes all existing XPO efforts so painful. We need to either optimize the crap out of it somehow or we need to make sure itâs not called except during rare events like device enumeration.
Nadav, want to resubmit your series? IIRC the only thing wrong with it was that it was a big change and we wanted a simpler fix to backport. But thatâs all done now, and I, at least, rather liked your code. :)