Re: [PATCH 1/2] ACPI: nsrepair2: Replace O(n²) bubble sort with O(n log n) sort_r()

From: Nick Huang

Date: Tue Feb 03 2026 - 06:03:37 EST


David Laight <david.laight.linux@xxxxxxxxx> 於 2026年2月2日週一 上午6:48寫道:
>
> On Sun, 1 Feb 2026 13:03:33 +0000
> Nick Huang <sef1548@xxxxxxxxx> wrote:
>
> > Replace the O(n²) bubble sort implementation in acpi_ns_sort_list()
> > with the kernel's sort_r() function which uses heapsort, providing
> > O(n log n) time complexity.
> >
> > This improves performance for large ACPI package lists while also
> > reducing code complexity by leveraging the existing kernel sort API.
>
> What is the break even size?
> While the heapsort is O(n long n) it is also more complicated.
> There is also the cost of the function call - especially with all the
> mitigations that distro kernels are likely to enable.
>
> For large datasets the d-cache locality of both sorts is particularly horrid.
> It is almost certainly better to allocate an array of index:value pairs
> and sort that.
> For very big datasets you want to sort small sections (that fit in the
> d-cache) and then use merge sorts (also O(n log n)) to combine them.
> (Yes - this is how you sort data with 3 mag-tape drives....)
>
> Oh, in any case, write separate functions for ascending/descending.
>
> David
>
Hi David

Thank you for your reply and the detailed feedback.


I ran KUnit benchmarks to investigate your questions.


=== Break-Even Point ===


I compared bubble sort vs heapsort across various array sizes:


N | Bubble(ns) | Heap(ns) | Faster

----|------------|----------|--------

2 | 61 | 99 | bubble

3 | 63 | 115 | bubble

4 | 78 | 163 | bubble

5 | 96 | 215 | bubble

6 | 119 | 260 | bubble

8 | 177 | 388 | bubble

10 | 257 | 539 | bubble

12 | 415 | 721 | bubble

16 | 726 | 1044 | bubble

20 | 1106 | 1484 | bubble

32 | 2854 | 3091 | bubble


Bubble sort is faster for all tested sizes. The break-even point was not

reached within N=32, which covers all realistic ACPI use cases:


- _ALR: 2-10 entries (ambient light response)

- _CST: 2-8 entries (C-states)

- _PSS: 5-20 entries (P-states)

- _TSS: 2-16 entries (T-states)


As you noted, heapsort has additional overhead from function calls and

mitigations that outweigh its O(n log n) advantage at small N.


=== Separate Ascending/Descending Functions ===


I also tested combined vs separate sort functions as you suggested:


N | Combined(ns) | Separate(ns) | Saved

----|--------------|--------------|------

4 | 84 | 75 | 11%

8 | 179 | 175 | 3%

16 | 737 | 806 | -9%


The results are mixed. Separate functions show marginal improvement at

small N, but the combined function performs better at N=16, possibly

due to instruction cache effects.


=== Conclusion ===


Given these results, replacing bubble sort with heapsort would likely

degrade performance for typical ACPI workloads. The existing bubble

sort implementation appears to be the right choice for this use case.



--
Regards,
Nick Huang