On Thu, Jul 11, 2024 at 02:15:38AM +0200, David Hildenbrand wrote:
(as a side note, cont-pte/cont-pmd should primarily be a hint from arch code
on how many entries we can batch, like we do in folio_pte_batch(); point is
that we want to batch also on architectures where we don't have such bits,
and prepare for architectures that implement various sizes of batching;
IMHO, having cont-pte/cont-pmd checks in common code is likely the wrong
approach. Again, folio_pte_batch() is where we tackled the problem
differently from the THP perspective)
I must say I did not check folio_pte_batch() and I am totally ignorant
of what/how it does things.
I will have a look.
I have an idea for a better page table walker API that would try batching
most entries (under one PTL), and walkers can just register for the types
they want. Hoping I will find some time to at least scetch the user
interface soon.
That doesn't mean that this should block your work, but the
cont-pte/cont/pmd hugetlb stuff is really nasty to handle here, and I don't
particularly like where this is going.
Ok, let me take a step back then.
Previous versions of that RFC did not handle cont-{pte-pmd} wide in the
open, so let me go back to the drawing board and come up with something
that does not fiddle with cont- stuff in that way.
I might post here a small diff just to see if we are on the same page.
As usual, thanks a lot for your comments David!
Feel free to reach out to discuss ways forward. I think we should
(a) move to the automatic cont-pte setting as done for THPs via
set_ptes().
(b) Batching PTE updates at all relevant places, so we get no change in
behavior: cont-pte bit will remain set.
(c) Likely remove the use of cont-pte bits in hugetlb code for anything
that is not a present folio (i.e., where automatic cont-pte bit
setting would never set it). Migration entries might require
thought (we can easily batch to achieve the same thing, but the
behavior of hugetlb likely differs to the generic way of handling
migration entries on multiple ptes: reference the folio vs.
the respective subpages of the folio).
Uhm, I see, but I am bit confused.
Although related, this seems orthogonal to this series and more like for
a next-thing to do, right?
It is true that this series tries to handle cont-{pmd,pte} in the
pagewalk api for hugetlb vmas, but in order to raise less eye brows I
can come up with a way not to do that for now, so we do not fiddle with
cont-stuff in this series.
Or am I misunderstanding you?