Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

From: Mike Kravetz
Date: Thu Oct 12 2017 - 13:20:53 EST


On 10/12/2017 07:37 AM, Michal Hocko wrote:
> On Wed 11-10-17 18:46:11, Mike Kravetz wrote:
>> Add new MAP_CONTIG flag to mmap system call. Check for flag in normal
>> mmap flag processing. If present, pre-allocate a contiguous set of
>> pages to back the mapping. These pages will be used a fault time, and
>> the MAP_CONTIG flag implies populating the mapping at the mmap time.
>
> I have only briefly read through the previous discussion and it is still
> not clear to me _why_ we want such a interface. I didn't give it much
> time yet but I do not think this is a good idea at all.

Thanks for looking Michal. The primary use case comes from devices that can
realize performance benefits if operating on physically contiguous memory.
What sparked this effort was Christoph and Guy's plumbers presentation
where they showed RDMA performance benefits that could be realized with
contiguous memory. I also remember sitting in a presentation about
Intel's QuackAssist technology at Vault last year. The presenter mentioned
that their compression engine needed to be passed a physically contiguous
buffer. I asked how a user could obtain such a buffer. They said they
had a special driver/ioctl for that. Yuck! I'm guessing there are other
specific use cases. That is why I wanted to start the discussion as to
whether there should be an interface to provide this functionality.

> Why? Do we want
> any user to simply consume larger order memory blocks? What would
> prevent from that?

We certainly would want to put restrictions in place for contiguous
memory allocations. Since it makes sense to pre-populate and lock
contiguous allocations, using the same restrictions as mlock is a start.
However, I can see the possible need for more restrictions.

> Also why should even userspace care about larger
> memory blocks? We have huge pages (be it preallocated or transparent)
> for that purpose already. Why should we add yet another another type

The 'sweet spot' for the Mellanox RDMA example is 2GB. We can not
achieve that with huge pages (on x86) today.

> What is the guaratee of such a mapping.

There is no guarantee. My suggestion is that mmap(MAP_CONTIG) would fail
with ENOMEM if a sufficiently sized contiguous area could not be found.
The caller would need to deal with failure.

> Does the memory always stays contiguous? How much contiguous it will be?

Yes, it remains contiguous. It is locked in memory.

> Who is going to use such an interface? And probably many other
> questions...

Thanks for asking. I am just throwing out the idea of providing an interface
for doing contiguous memory allocations from user space. There are at least
two (and possibly more) devices that could benefit from such an interface.

--
Mike Kravetz