Nit: the orders = ... order = ... looks like this might deserve a helper
function that makes this easier to read.
To be honest, the existing function that I've modified is a bit of a mess.
thp_vma_allowable_orders() calls thp_vma_suitable_orders() if we are not in a
page fault, because the page fault handlers already do that check themselves. It
would be nice to refactor the whole thing so that thp_vma_allowable_orders() is
a strict superset of thp_vma_suitable_orders(). Then this can just call
thp_vma_allowable_orders(). But that's going to start touching the PMD and PUD
handlers, so prefer if we leave that for a separate patch set.
Nit: Why call thp_vma_suitable_orders if the orders are already 0? Again, some
helper might be reasonable where that is handled internally.
Because thp_vma_suitable_orders() will handle it safely and is inline, so it
should just as efficient? This would go away with the refactoring described above.
Comment: For order-0 we'll always perform a function call to both
thp_vma_allowable_orders() / thp_vma_suitable_orders(). We should perform some
fast and efficient check if any <PMD THP are even enabled in the system / for
this VMA, and in that case just fallback before doing more expensive checks.
thp_vma_allowable_orders() is inline as you mentioned.
I was deliberately trying to keep all the decision logic in one place
(thp_vma_suitable_orders) because it's already pretty complicated. But if you
insist, how about this in the header:
static inline
unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags, bool smaps,
bool in_pf, bool enforce_sysfs,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
if (enforce_sysfs && vma_is_anonymous(vma)) {
unsigned long mask = READ_ONCE(huge_anon_orders_always);
if (vm_flags & VM_HUGEPAGE)
mask |= READ_ONCE(huge_anon_orders_madvise);
if (hugepage_global_always() ||
((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
mask |= READ_ONCE(huge_anon_orders_inherit);
orders &= mask;
if (!orders)
return 0;
enforce_sysfs = false;
}
return __thp_vma_allowable_orders(vma, vm_flags, smaps, in_pf,
enforce_sysfs, orders);
}
Then the above check can be removed from __thp_vma_allowable_orders() - it will
still retain the `if (enforce_sysfs && !vma_is_anonymous(vma))` part.
+
+ if (!orders)
+ goto fallback;
+
+ pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
+ if (!pte)
+ return ERR_PTR(-EAGAIN);
+
+ order = first_order(orders);
+ while (orders) {
+ addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
+ vmf->pte = pte + pte_index(addr);
+ if (pte_range_none(vmf->pte, 1 << order))
+ break;
Comment: Likely it would make sense to scan only once and determine the "largest
none range" around that address, having the largest suitable order in mind.
Yes, that's how I used to do it, but Yu Zhou requested simplifying to this,
IIRC. Perhaps this an optimization opportunity for later?
+ order = next_order(&orders, order);
+ }
+
+ vmf->pte = NULL;
Nit: Can you elaborate why you are messing with vmf->pte here? A simple helper
variable will make this code look less magical. Unless I am missing something
important :)
Gahh, I used to pass the vmf to what pte_range_none() was refactored into (an
approach that was suggested by Yu Zhou IIRC). But since I did some refactoring
based on some comments from JohnH, I see I don't need that anymore. Agreed; it
will be much clearer just to use a local variable. Will fix.
+ pte_unmap(pte);
+
+ gfp = vma_thp_gfp_mask(vma);
+
+ while (orders) {
+ addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
+ folio = vma_alloc_folio(gfp, order, vma, addr, true);
+ if (folio) {
+ clear_huge_page(&folio->page, addr, 1 << order);
+ return folio;
+ }
+ order = next_order(&orders, order);
+ }
+
Queestion: would it make sense to combine both loops? I suspect memory
allocations with pte_offset_map()/kmao are problematic.
They are both operating on separate orders; next_order() is "consuming" an order
by removing the current one from the orders bitfield and returning the next one.
So the first loop starts at the highest order and keeps checking lower orders
until one fully fits in the VMA. And the second loop starts at the first order
that was found to fully fits and loops to lower orders until an allocation is
successful.