Re: [PATCH 00/45] hugetlb pagewalk unification

From: Oscar Salvador
Date: Wed Jul 10 2024 - 07:27:12 EST


On Wed, Jul 10, 2024 at 05:52:43AM +0200, David Hildenbrand wrote:
> I understand that. And it would all be easier+more straight forward if we
> wouldn't have that hugetlb CONT-PTE / CONT-PMD stuff in there that works
> similar, but different to "ordinary" cont-pte for thp.
>
> I'm sure you stumbled over the set_huge_pte_at() on arm64 for example. If
> we, at one point *don't* use these hugetlb functions right now to modify
> hugetlb entries, we might be in trouble.
>
> That's why I think we should maybe invest our time and effort in having a
> new pagewalker that will just batch such things naturally, and users that
> can operate on that naturally. For example: a hugetlb cont-pte-mapped folio
> will just naturally be reported as a "fully mapped folio", just like a THP
> would be if mapped in a compatible way.
>
> Yes, this requires more work, but as raised in some patches here, working on
> individual PTEs/PMDs for hugetlb is problematic.
>
> You have to batch every operation, to essentially teach ordinary code to do
> what the hugetlb_* special code would have done on cont-pte/cont-pmd things.
>
>
> (as a side note, cont-pte/cont-pmd should primarily be a hint from arch code
> on how many entries we can batch, like we do in folio_pte_batch(); point is
> that we want to batch also on architectures where we don't have such bits,
> and prepare for architectures that implement various sizes of batching;
> IMHO, having cont-pte/cont-pmd checks in common code is likely the wrong
> approach. Again, folio_pte_batch() is where we tackled the problem
> differently from the THP perspective)

I must say I did not check folio_pte_batch() and I am totally ignorant
of what/how it does things.
I will have a look.

> I have an idea for a better page table walker API that would try batching
> most entries (under one PTL), and walkers can just register for the types
> they want. Hoping I will find some time to at least scetch the user
> interface soon.
>
> That doesn't mean that this should block your work, but the
> cont-pte/cont/pmd hugetlb stuff is really nasty to handle here, and I don't
> particularly like where this is going.

Ok, let me take a step back then.
Previous versions of that RFC did not handle cont-{pte-pmd} wide in the
open, so let me go back to the drawing board and come up with something
that does not fiddle with cont- stuff in that way.

I might post here a small diff just to see if we are on the same page.

As usual, thanks a lot for your comments David!


--
Oscar Salvador
SUSE Labs