Re: [PATCH] proc: revert /proc/<pid>/maps [stack:TID] annotation
From: Kirill A. Shutemov
Date: Mon Jan 25 2016 - 18:15:03 EST
On Mon, Jan 25, 2016 at 01:30:00PM -0800, Colin Cross wrote:
> On Tue, Jan 19, 2016 at 3:30 PM, Kirill A. Shutemov
> <kirill@xxxxxxxxxxxxx> wrote:
> > On Tue, Jan 19, 2016 at 02:14:30PM -0800, Andrew Morton wrote:
> >> On Tue, 19 Jan 2016 13:02:39 -0500 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> >> > b764375 ("procfs: mark thread stack correctly in proc/<pid>/maps")
> >> > added [stack:TID] annotation to /proc/<pid>/maps. Finding the task of
> >> > a stack VMA requires walking the entire thread list, turning this into
> >> > quadratic behavior: a thousand threads means a thousand stacks, so the
> >> > rendering of /proc/<pid>/maps needs to look at a million threads. The
> >> > cost is not in proportion to the usefulness as described in the patch.
> >> >
> >> > Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
> >> > /proc/<pid>/numa_maps) usable again for higher thread counts.
> >> >
> >> > The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained,
> >> > as identifying the stack VMA there is an O(1) operation.
> >> Four years ago, ouch.
> >> Any thoughts on the obvious back-compatibility concerns? ie, why did
> >> Siddhesh implement this in the first place? My bad for not ensuring
> >> that the changelog told us this.
> >> https://lkml.org/lkml/2012/1/14/25 has more info:
> >> : Memory mmaped by glibc for a thread stack currently shows up as a
> >> : simple anonymous map, which makes it difficult to differentiate between
> >> : memory usage of the thread on stack and other dynamic allocation.
> >> : Since glibc already uses MAP_STACK to request this mapping, the
> >> : attached patch uses this flag to add additional VM_STACK_FLAGS to the
> >> : resulting vma so that the mapping is treated as a stack and not any
> >> : regular anonymous mapping. Also, one may use vm_flags to decide if a
> >> : vma is a stack.
> >> But even that doesn't really tell us what the actual *value* of the
> >> patch is to end-users.
> > I doubt it can be very useful as it's unreliable: if two stacks are
> > allocated end-to-end (which is not good idea, but still) it can only
> > report [stack:XXX] for the first one as they are merged into one VMA.
> > Any other anon VMA merged with the stack will be also claimed as stack,
> > which is not always correct.
> > I think report the VMA as anon is the best we can know about it,
> > everything else just rather expensive guesses.
> An alternative to guessing is the anonymous VMA naming patch used on
> Android, https://lkml.org/lkml/2013/10/30/518. It allows userspace to
> name anonymous memory however it wishes, and prevents vma merging
> adjacent regions with different names. Android uses it to label
> native heap memory, but it would work well for stacks too.
I don't think preventing vma merging is fair price for the feature: you
would pay extra in every find_vma() (meaning all page faults).
I think it would be nice to have a way to store this kind of sideband info
without impacting critical code path.
One other use case I see for such sideband info is storing hits from
MADV_HUGEPAGE/MADV_NOHUGEPAGE: need to split vma just for these hints is
Kirill A. Shutemov