Re: [PATCH 1/5] lib/sort: Make swap functions more generic
From: George Spelvin
Date: Sat Mar 09 2019 - 16:07:51 EST
Andy Shevchenko wrote:
> Shouldn't simple memcpy cover these case for both 32- and 64-bit architectures?
Speaking of replacing swap with copying via temporary buffers, one
idea that did come to mind was avoiding swap for sufficiently small
objects.
Every sift-down is actually a circular shift. Once the target
position hs been found, we rotate the path from the root to the
target up one, with the target filled in by the previous root.
(When breaking down the heap, there's one additional link in the
cycle, where the previous root goes to the end of the heap and the
end of the heap goes to the target.)
If we had a temporary buffer (128 bytes would handle most things),
we could copy the previous root there and *copy*, rather than swap,
the elements on the path to the target up, then finally copy the
previous root to the target.
However, to rotate up, the this must be done in top-down order.
The way the code currently works with premultiplied offsets, it's
easy to traverse bottom-up, but very awkward to retrace the
top-down path.
The only solution I can think of is to build a bitmap of the
left/right turnings from the root to the leaf, and then back it up
to the target. There are a few ways to do this:
1) The obvious big-endian order. The bitmap is simply the 1-based
position of the leaf. To add a level, shift left and add the new
bit at the bottom. To back up a step, shift right.
To retrace, create a 1-bit mask equal to the msbit of the index
((smear_right(x) >> 1) + 1) and repeatedly shift the mask right.
2) The reverse order. We use a 1-bit mask while building the
bitmap, and while retracing, just examine the lsbit while shifting
the bitmap right.
3) As option 1, but don't build the bitmap as we're walking down;
rather reconstruct it from the premultiplied offset using
reciprocal_divide().
Nothing really jumps out to me as The Right Way to do it.
I don't want to bloat the code to the point that it would be
easier to implement a different algorithm entirely.