Re: [PATCH net-next] net: sched: cake: Optimize number of calls to cake_heapify()

From: Toke Høiland-Jørgensen
Date: Mon Apr 08 2024 - 09:02:47 EST


Kuan-Wei Chiu <visitorckw@xxxxxxxxx> writes:

> On Sun, Apr 07, 2024 at 06:10:04PM +0200, Toke Høiland-Jørgensen wrote:
>> Kuan-Wei Chiu <visitorckw@xxxxxxxxx> writes:
>>
>> > Improve the max-heap construction process by reducing unnecessary
>> > heapify operations. Specifically, adjust the starting condition from
>> > n / 2 to n / 2 - 1 in the loop that iterates over all non-leaf
>> > elements.
>>
>> Please add an explanation for why this change is correct, and why it is
>> beneficial. "Improve" and "unnecessary" is way too implicit.
>>
>> pw-bot: cr
>
> For correctness:
> To build a heap, we need to perform heapify operations on all non-leaf
> nodes, so we need to find the index of the first non-leaf node. In a
> heap, the index of node i, the left child's index is 2 * i + 1, and the
> right child's index is 2 * i + 2. The left and right children of node
> CAKE_MAX_TINS * CAKE_QUEUES / 2 are at indexes CAKE_MAX_TINS *
> CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2, respectively. Both
> children's indexes are beyond the range of the heap, indicating that
> CAKE_MAX_TINS * CAKE_QUEUES / 2 is a leaf node. The left child of node
> CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 is at index CAKE_MAX_TINS *
> CAKE_QUEUES - 1, and the right child is at index CAKE_MAX_TINS *
> CAKE_QUEUES. Therefore, we know the left child exists, but the right
> child does not. Since it's not a leaf node, the loop should start from
> it.
>
> For benefit:
> We can reduce 2 function calls (one for cake_heapify() and another for
> cake_heap_get_backlog()) and decrease 5 branch condition evaluations
> (one for iterating through all non-leaf nodes, one inside the while
> loop of cake_heapify(), and three more inside the while loop with if
> conditions). The only added operation is an extra subtraction.
>
> If you're satisfied with the explanation above, I can attempt to
> rewrite the commit message and send the v2 patch.

Yes, sounds reasonable. Did you measure any real-world performance
benefit, or is this purely a theoretical optimisation? Either way,
please indicate this in the updated patch description.

-Toke