Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

From: Dave Hansen
Date: Thu Aug 07 2008 - 13:30:22 EST


On Thu, 2008-08-07 at 17:06 +0100, Mel Gorman wrote:
> On (06/08/08 12:50), Dave Hansen didst pronounce:
> > The main thing this set of patches does that I care about is take an
> > anonymous VMA and replace it with a hugetlb VMA. It does this on a
> > special cue, but does it nonetheless.
>
> This is not actually a new thing. For a long time now, it has been possible to
> back malloc() with hugepages at a userspace level using the morecore glibc
> hook. That is replacing anonymous memory with a file-backed VMA. It happens
> in a different place but it's just as deliberate as backing stack and the
> end result is very similar. As the file is ram-based, it doesn't have the
> same types of consequences like dirty page syncing that you'd ordinarily
> watch for when moving from anonymous to file-backed memory.

Yes, it has already been done in userspace. That's fine. It isn't
adding any complexity to the kernel. This is adding behavior that the
kernel has to support as well as complexity.

> > This patch has crossed a line in that it is really the first
> > *replacement* of a normal VMA with a hugetlb VMA instead of the creation
> > of the VMAs at the user's request.
>
> We crossed that line with morecore, it's back there somewhere. We're just
> doing in kernel this time because backing stacks with hugepages in userspace
> turned out to be a hairy endevour.
>
> Properly supporting anonymous hugepages would either require larger
> changes to the core or reimplementing yet more of mm/ in mm/hugetlb.c.
> Neither is a particularly appealing approach, nor is it likely to be a
> very popular one.

I agree. It is always much harder to write code that can work
genericallyï (and get it accepted) than just write the smallest possible
hack and stick it in fs/exec.c.

Could this patch at least get fixed up to look like it could be used
more generically? Some code to look up and replace anonymous VMAs with
hugetlb-backed ones.
ï
> > Because of the limitations like its inability to grow the VMA, I can't
> > imagine that this would be a generic mechanism that we can use
> > elsewhere.
>
> What other than a stack even cares about VM_GROWSDOWN working? Besides,
> VM_GROWSDOWN could be supported in a hugetlbfs file by mapping the end of
> the file and moving the offset backwards (yeah ok, it ain't the prettiest
> but it's less churn than introducing significantly different codepaths). It's
> just not something that needs to be supported at first cut.
>
> brk() if you wanted to back hugepages with it conceivably needs a resizing
> VMA but in that case it's growing up in which case just extend the end of
> the VMA and increase the size of the file.

I'm more worried about a small huge page size (say 64k) and not being
able to merge the VMAs. I guess it could start in the *middle* of a
file and map both directions.

I guess you could always just have a single (very sparse) hugetlb file
per mm to do all of this 'anonymous' hugetlb memory memory stuff, and
just map its offsets 1:1 on to the process's virtual address space.
That would make sure you could always merge VMAs, no matter how they
grew together.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/