Re: [PATCH] percpu: Fix hint invariant breakage

From: Dennis Zhou

Date: Tue Apr 21 2026 - 20:13:05 EST


On Mon, Apr 20, 2026 at 12:35:48PM +0000, Joonwon Kang wrote:
> > Hello,
> >
> > Sorry for the delay, I've been a bit sick.
> >
> > On Mon, Mar 23, 2026 at 02:05:14PM +0000, Joonwon Kang wrote:
> > > > Hello,
> > > >
> > > > On Fri, Mar 20, 2026 at 11:52:14AM +0000, Joonwon Kang wrote:
> > > > > The invariant "scan_hint_start > contig_hint_start if and only if
> > > > > scan_hint == contig_hint" should be kept for hint management. However,
> > > > > it could be broken in some cases:
> > > > >
> > > >
> > > > First I'd just like to apologize. I spent an hour yesterday trying to
> > > > remember why the invariant exists and the reality is this code is more
> > > > clever than it needs to be.
> > >
> > > Thanks for taking time for this and sharing more context. While you are at
> > > it, I have a fundamental question on the invariant. I had deliberation and
> > > discussion on what benefits the invariant gets to the percpu allocator by
> > > its existence. My understanding is that if we put contig_hint before
> > > scan_hint when they are the same, it is more likely that contig_hint is
> > > broken by a future allocation, which leads to a linear scan after the
> > > scan_hint for hints update, although we could save scanning upto scan_hint
> > > when contig_hint is not broken. On the other hand, if we put scan_hint
> > > before contig_hint instead, it is more likely that scan_hint is broken
> > > while keeping contig_hint, which does not lead to the linear scan for
> > > hints update, although we could not save the scanning that could be saved
> > > in the other case.
> > >
> > > In other words, if contig_hint breaking allocations occur a lot in general
> > > with the current invariant, the performance may more suffer than without
> > > the invariant. I also think that there would be no strict reason of having
> > > the invariant.
> > >
> >
> > I think the original premise is that percpu memory is quite expensive, 1
> > allocation costs nr_cpus * sizeof(allocation). So we do our best to bin
> > pack at the cost of faster allocations. We could always just break the
> > contig_hint but then over time we could cause more fragmentation.
> >
> > The case that triggered this was netdev needing 8 byte objects with 16
> > byte alignment [1].
> >
>
> Thank you for sharing the points about the bin packing. Although I did not
> fully understand the relationship between breakage of the contig_hint and the
> fragmentation trend, it may be helpful to reference the case you referred to.
> I guess you may have missed the link for the reference [1]? Could you help to
> provide the link, if you intended to leave it?
>

Ah sorry that's my bad.

My intention behind the scan_hint wasn't to use the scan_hint as an
earlier contig_hint, but to prevent us from scanning if we need to break
the contig_hint.

[1] https://lore.kernel.org/netdev/CANn89iKb_vW+LA-91RV=zuAqbNycPFUYW54w_S=KZ3HdcWPw6Q@xxxxxxxxxxxxxx/

> > > So, could you clarify the necessity of the invariant? If there is no must
> > > reason, then I could post another spin-off patch to remove the invariant
> > > at all so that we could simplify the code and experiment the result. How
> > > do you think?
> > >
> >
> > I can't really recall the exact reasoning for the invariant, but it was
> > probably along the lines of wanting to not lose information if possible.
> >
> > Say an earlier area becomes free that is the same size as the
> > contig_hint but with better alignment, we ant to use that as the
> > contig_hint but then we either have to lose the scan_hint or keep it
> > with the invariant. Given the premise above, I believe we want to
> > continue bin packing, I think the general idea of scanning next time
> > around isn't the worst thing.
> >
> > Sadly because it's already there, and has worked for quite some time,
> > it's kind of on us today to provide data / reasoning to delete it. I'd
> > wager that some upcoming work is going to change how percpu gives out
> > objects either through some sort of slab caching that we can revisit
> > this more in that context.
> >
>
> Understood and thanks for your detailed explanation. I will keep the invariant
> as-is unless I have a clear data point to reverse it. I sent the new patch set
> v3 recently with this in mind. Please help to review it ;)
>

Sorry for the delay. Provided a bit of feedback just now. Let me know if
you want to discuss more about what we want v4 to look like before
spending too much time.

Thanks,
Dennis