Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

From: David Hildenbrand
Date: Mon Apr 22 2024 - 14:59:30 EST


On 22.04.24 20:11, Guillaume Morin wrote:
(Dropping Mike Kravetz as CC since he has retired and his email is no
longer valid, adding Muchun since he's the current hugetlb maintainer,
as well as linux-trace-kernel)

On 22 Apr 11:39, David Hildenbrand wrote:

On 19.04.24 20:37, Guillaume Morin wrote:
libhugetlbfs, the Intel iodlr code both allow to remap .text onto a
hugetlb private mapping. It's also pretty easy to do it manually.
One drawback of using this functionality is the lack of support for
uprobes (NOTE uprobe ignores shareable vmas)

This patch adds support for private hugetlb mappings. It does require exposing
some hugetlbfs innards and relies on copy_user_large_folio which is only
available when CONFIG_HUGETLBFS is used so I had to use an ugly #ifdef

If there is some interest in applying this patch in some form or
another, I am open to any refactoring suggestions (esp getting rid the
#ifdef in uprobes.c) . I tried to limit the
amount of branching.

All that hugetlb special casing .... oh my. What's the benefit why we should
be interested in making that code less clean -- to phrase it in a nice way
;) ?

I do appreciate the nice phrasing. Believe me, I did try to limit the
special casing to a minimum :-).

Outside of __replace_page, I added only 3-ish branches so I do not think
it's *too* bad. The uprobe code is using PAGE_{SHIFT,MASK} quite liberally so I
had to add calls to retrieve these for the hugetlb vmas.

__replace_page has a lot of special casing. I certainly agree (and
unfortunately for me it's at the beginning of the patch :)). It's doing
something pretty uncommon outside of the mm code so it has to make a
bunch of specific hugetlb calls. I am not quite sure how to improve it
but if you have suggestions, I'd be happy to refactor.

See below.


The benefit - to me - is very clear. People do use hugetlb mappings to
run code in production environments. The perf benefits are there for some
workloads. Intel has published a whitepaper about it etc.
Uprobes are a very good tool to do live tracing. If you can restart the
process and reproduce, you should be able to disable hugetlb remapping
but if you need to look at a live process, there are not many options.
Not being able to use uprobes is crippling.

Please add all that as motivation to the patch description or cover letter.


Yes, libhugetlbfs exists. But why do we have to support uprobes with it?
Nobody cared until now, why care now?

I think you could ask the same question for every new feature patch :)

I have to, because it usually indicates a lack of motivation in the cover-letter/patch description :P

People will have to maintain that code, and maintaining hugetlb code in odd places is no fun ...


Since the removal a few releases ago of the __morecore() hook in glibc,
the main feature of libhugetlbfs is ELF segments remapping. I think
there are definitely a lot of users that simply deal with this
unnecessary limitation.

I am certainly not shoving this patch through anyone's throat if there
is no interest. But we definitely find it a very useful feature ...

Let me try to see if we can get this done cleaner.

One ugly part (in general here) is the custom page replacement in the registration part.

We are guaranteed to have a MAP_PRIVATE mapping. Instead of replacing pages ourselves (which we likely shouldn't do ...) ... maybe we could use FAULT_FLAG_UNSHARE faults such that we will get an anonymous folio populated. (like KSM does nowadays)

Punching FOLL_PIN|FOLL_LONGTERM into GUP would achieve the same thing, but using FOLL_WRITE would not work on many file systems. So maybe we have to trigger an unsharing fault ourselves.

That would do the page replacement for us and we "should" be able to lookup an anonymous folio that we can then just modify, like ptrace would.

But then, there is also unregistration part, with weird conditional page replacement. Zapping the anon page if the content matches the content of the original page is one thing. But why are we placing an existing anonymous page by a new anonymous page when the content from the original page differs (but matches the one from the just copied page?)?

I'll have to further think about that one. It's all a bit nasty.


One thing to note is that hugetlb folios don't grow on trees. Likely, Many setups *don't* reserve extra hugetlb folios and you might just easily be running out of free hugetlb folios that you can use to break COW here (replace a file hugetlb by a fresh anon hugetlb page). Likely it's easy to make register or unregister fail.

--
Cheers,

David / dhildenb