Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace

From: Yury Norov
Date: Sat May 04 2019 - 00:22:18 EST


On Fri, May 03, 2019 at 05:57:31PM -0400, Rafael Aquini wrote:
> On Fri, May 03, 2019 at 01:49:12PM -0700, Yury Norov wrote:
> > On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> > > In the mainline kernel, there is no quick mechanism to get the virtual
> > > memory size of the current process from userspace.
> > >
> > > Despite the current state of affairs, this information is available to the
> > > user through several means, one being a linear search of the entire address
> > > space. This is an inefficient use of cpu cycles.
> > >
> > > A component of the libhugetlb kernel test does exactly this, and as
> > > systems' address spaces increase beyond 32-bits, this method becomes
> > > exceedingly tedious.
> > >
> > > For example, on a ppc64le system with a 47-bit address space, the linear
> > > search causes the test to hang for some unknown amount of time. I
> > > couldn't give you an exact number because I just ran it for about 10-20
> > > minutes and went to go do something else, probably to get coffee or
> > > something, and when I came back, I just killed the test and patched it
> > > to use this new mechanism. I re-ran my new version of the test using a
> > > kernel with this patch, and of course it passed through the previously
> > > bottlenecking codepath nearly instantaneously.
> > >
> > > As such, I propose that the prctl syscall be extended to include the
> > > option to retrieve TASK_SIZE from the kernel.
> > >
> > > This patch will allow us to upgrade an O(n) codepath to O(1) in an
> > > architecture-independent manner, and provide a mechanism for future
> > > generations to do the same.
> >
> > So the only reason for the new API is boosting some random poorly
> > written userspace test? Why don't you introduce binary search instead?
> >
>
> there's no real cost in exposing the value that is known to the kernel,

Really? We all here used to think that kernel programming is one of
the most difficult professions in the world. There is huge cost of
proper implementation of a feature, careful review, spread testing on
various platforms and long-term maintenance and support.

In this specific example of exposing TASK_SIZE your team made too much
things wrong to realize it, I hope.

> anyways, as long as it's not a freaking hassle (like trying to go with
> this prctl(2) stunt). We just need to get it properly exported alongside
> other task's VM-related values at /proc/<pid>/status.

I found this thread thrilling. Please keep me in CC with your
/proc/<pid>/status effort.

Yury

> > Look at /proc/<pid>/maps. It may help to reduce the memory area to be
> > checked.