RE: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

From: Guy Shattah
Date: Tue Oct 17 2017 - 06:50:14 EST




> > On 16/10/2017 11:24, Michal Hocko wrote:
> > > On Sun 15-10-17 10:50:29, Guy Shattah wrote:
> > > >
> > > > On 13/10/2017 19:17, Michal Hocko wrote:
> > > > > On Fri 13-10-17 10:56:13, Cristopher Lameter wrote:
> > > > > > On Fri, 13 Oct 2017, Michal Hocko wrote:
> > > > > > > > There are numerous RDMA devices that would all need the
> > > > > > > > mmap implementation. And this covers only the needs of one
> > > > > > > > subsystem. There are other use cases.
> > > > > > > That doesn't prevent providing a library function which
> > > > > > > could be reused by all those drivers. Nothing really too
> > > > > > > much different from remap_pfn_range.
> > > > > > And then in all the other use cases as well. It would be much
> > > > > > easier if mmap could give you the memory you need instead of
> > > > > > havig numerous drivers improvise on their own. This is in
> > > > > > particular also useful for numerous embedded use cases where you
> need contiguous memory.
> > > > > But a generic implementation would have to deal with many issues
> > > > > as already mentioned. If you make this driver specific you can
> > > > > have access control based on fd etc... I really fail to see how
> > > > > this is any different from remap_pfn_range.
> > > > Why have several driver specific implementation if you can
> > > > generalize the idea and implement an already existing POSIX
> > > > standard?
> > > Because users shouldn't really care, really. We do have means to get
> > > large memory and having a guaranteed large memory is a PITA. Just
> > > look at hugetlb and all the issues it exposes. And that one is
> > > preallocated and it requires admin to do a conscious decision about
> > > the amount of the memory. You would like to establish something
> > > similar except without bounds to the size and no pre-allowed amount
> > > by an admin. This sounds just crazy to me.
> >
> > Users do care about the performance they get using devices which
> > benefit from contiguous memory allocation. Assuming that user
> > requires 700Mb of contiguous memory. Then why allocate giant (1GB)
> > page when you can allocate 700Mb out of the 1GB and put the rest of
> > the 300Mb back in the huge-pages/small-pages pool?
>
> I believe I have explained that part. Large pages are under admin control and
> responsibility. If you get a free ticket to large memory to any user who can
> pin that memory then you are in serious troubles.
>
> > > On the other hand if you make this per-device mmap implementation
> > > you can have both admin defined policy on who is allowed this memory
> > > and moreover drivers can implement their fallback strategies which
> > > best suit their needs. I really fail to see how this is any
> > > different from using specialized mmap implementations.
> > We tried doing it in the past. but the maintainer gave us a very good
> > argument:
> > " If you want to support anonymous mmaps to allocate large contiguous
> > pages work with the MM folks on providing that in a generic fashion."
>
> Well, we can provide a generic library functions for your driver to use so that
> you do not have to care about implementation details but I do not think
> exposing this API to the userspace in a generic fashion is a good idea.
> Especially when the only usecase that has been thought through so far seems
> to be a very special HW optimiztion.

Are you going to be OK with kernel API which implements contiguous memory allocation?
Possibly with mmap style? Many drivers could utilize it instead of having their own weird
and possibly non-standard way to allocate contiguous memory.
Such API won't be available for user space.

We can begin with implementing kernel API and postpone the userspace api discussion for a future date.
if it is sufficient. We might not have to discuss it at all.


>
> > After discussing it with people who have the same requirements as we
> > do - I totally agree with him
> >
> >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcom
> m
> >
> ents.gmane.org%2Fgmane.linux.drivers.rdma%2F31467&data=02%7C01%7Cs
> guy%
> >
> 40mellanox.com%7C24d72e65908044f3d38a08d5149204ee%7Ca652971c7d
> 2e4d9ba6
> >
> a4d149256f461b%7C0%7C0%7C636437539732729965&sdata=oueheNfnsMS
> PAGAehcT5
> > ZDteHxMVQ9%2F7nJNKPPfgVvM%3D&reserved=0
> >
> > > I might be really wrong but I consider such a general purpose flag
> > > quite dangerous and future maintenance burden. At least from the
> > > hugetlb/THP history I do not see why this should be any different.
> >
> > Could you please elaborate why is it dangerous and future maintenance
> > burden?
>
> Providing large contiguous memory ranges is not easy and we actually do not
> have any reliable way to offer such a functionality for the kernel users
> because we assume they are not that many. Basically anything larger than
> order-3 is best effort. Even changes constant improvements of the
> compaction still leaves us with something we cannot fully rely on. And now
> you want to expose this to the userspace with basically arbitrary memory
> sizes to be supported?
>
> But putting that aside. Pinning a lot of memory might cause many
> performance issues and misbehavior. There are still kernel users who need
> high order memory to work properly. On top of that you are basically
> allowing an untrusted user to deplete higher order pages very easily unless
> there is a clever way to enforce per user limit on this.

My previous suggestion prevents untrusted userspace code.


Guy