Re: [PATCH v6 02/10] x86, mpx: add MPX specific mmap interface

From: Andy Lutomirski
Date: Thu Jun 26 2014 - 19:15:44 EST


On Thu, Jun 26, 2014 at 3:58 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> On 06/26/2014 03:19 PM, Andy Lutomirski wrote:
>> On Wed, Jun 25, 2014 at 2:45 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>>> On 06/25/2014 02:05 PM, Andy Lutomirski wrote:
>>>> Hmm. the memfd_create thing may be able to do this for you. If you
>>>> created a per-mm memfd and mapped it, it all just might work.
>>>
>>> memfd_create() seems to bring a fair amount of baggage along (the fd
>>> part :) if all we want is a marker. Really, all we need is _a_ bit, and
>>> some way to plumb to userspace the RSS values of VMAs with that bit set.
>>>
>>> Creating and mmap()'ing a fd seems a rather roundabout way to get there.
>>
>> Hmm. So does VM_MPX, though. If this stuff were done entirely in
>> userspace, then memfd_create would be exactly the right solution, I
>> think.
>>
>> Would it work to just scan the bound directory to figure out how many
>> bound tables exist?
>
> Theoretically, perhaps.
>
> Practically, the bounds directory is 2GB, and it is likely to be very
> sparse. You would have to walk the page tables finding where pages were
> mapped, then search the mapped pages for bounds table entries.
>
> Assuming that it was aligned and minimally populated, that's a *MINIMUM*
> search looking for a PGD entry, then you have to look at 512 PUD
> entries. A full search would have to look at half a million ptes.
> That's just finding out how sparse the first level of the tables are
> before you've looked at a byte of actual data, and if they were empty.
>
> We could keep another, parallel, data structure that handles this better
> other than the hardware tables. Like, say, an rbtree that stores ranges
> of virtual addresses. We could call them vm_area_somethings ... wait a
> sec... we have a structure like that. ;)
>
>

So here's my mental image of how I might do this if I were doing it
entirely in userspace: I'd create a file or memfd for the bound tables
and another for the bound directory. These files would be *huge*: the
bound directory file would be 2GB and the bounds table file would be
2^48 bytes or whatever it is. (Maybe even bigger?)

Then I'd just map pieces of those files wherever they'd need to be,
and I'd make the mappings sparse. I suspect that you don't actually
want a vma for each piece of bound table that gets mapped -- the space
of vmas could end up incredibly sparse. So I'd at least map (in the
vma sense, not the pte sense) and entire bound table at a time. And
I'd probably just map the bound directory in one big piece.

Then I'd populate it in the fault handler.

This is almost what the code is doing, I think, modulo the files.

This has one killer problem: these mappings need to be private (cowed
on fork). So memfd is no good. There's got to be an easyish way to
modify the mm code to allow anonymous maps with vm_ops. Maybe a new
mmap_region parameter or something? Maybe even a special anon_vma,
but I don't really understand how those work.


Also, egads: what happens when a bound table entry is associated with
a MAP_SHARED page?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/