Re: [PATCH 1/8] lib/sort: Heapsort implementation of sort()

From: Andreas Gruenbacher
Date: Tue Feb 01 2005 - 12:53:03 EST


On Mon, 2005-01-31 at 20:30, Matt Mackall wrote:
> On Mon, Jan 31, 2005 at 06:16:23PM +0100, Andreas Gruenbacher wrote:
> > Hello,
> >
> > On Mon, 2005-01-31 at 08:34, Matt Mackall wrote:
> > > This patch adds a generic array sorting library routine. This is meant
> > > to replace qsort, which has two problem areas for kernel use.
> >
> > looks reasonable.
> >
> > > Note that this function has an extra parameter for passing in an
> > > optimized swapping function. This is worth 10% or more over the
> > > typical byte-by-byte exchange functions.
> >
> > I would appreciate a version without the swap callback.
>
> Why? To eliminate an argument?

Yes, because a custom swap routine isn't very useful generally. It's
over-engineered IMHO.

> > The optimized version of swap should use the machine word size
> > instead of u32.
>
> That occurred to me, but most instances I found in my audit were using
> u32 rather than long.

Alright, that may be the case.

> > How about this approach instead, if you think we
> > must really optimize swapping?
> >
> > static inline void swap(void *a, void *b, int size)
> > {
> > if (size % sizeof(long)) {
> > char t;
> > do {
> > t = *(char *)a;
> > *(char *)a++ = *(char *)b;
> > *(char *)b++ = t;
> > } while (--size > 0);
> > } else {
> > long t;
> > do {
> > t = *(long *)a;
> > *(long *)a = *(long *)b;
> > *(long *)b = t;
> > size -= sizeof(long);
> > } while (size > sizeof(long));
> > }
> > }
>
> This makes things worse. Sort isn't inlined, so we don't know size
> until we're called and then we're branching in the innermost loop

Well, the example code I referred to had an inlined __sort, and the size
== sizeof(long) case ended up being perfectly optimized. It bloats the
code somewhat though, but it's still tiny.

> and growing the code footprint to boot.

You mean the object footprint? I don't care much whether we use some 350
bytes more or not.

> Function pointer wins in my benchmarks.

I had a slowdown in the low percentage range when doing bytewise swap
instead of wordwise swap as well, but function pointers were about as
fast as wordwise swap. So no significant gain.

> Note that there are callers like IA64 extable that want a custom swap already:
>
> - * stack may be more than 2GB away from the exception-table).
> + * Sort the exception table. It's usually already sorted, but there
> + * may be unordered entries due to multiple text sections (such as the
> + * .init text section). Note that the exception-table-entries contain
> + * location-relative addresses, which requires a bit of care during
> + * sorting to avoid overflows in the offset members (e.g., it would
> + * not be safe to make a temporary copy of an exception-table entry on
> + * the stack, because the stack may be more than 2GB away from the
> + * exception-table).
> */

We could either leave the current insertion sort in place in
arch/ia64/mm/extable.c, or transform the exception table into sortable
form first, as in the below mock-up.

+ /* Exception-table-entries contain location-relative addresses. Convert
+ to addresses relative to START before sorting, and convert back to
+ location-relative addresses afterwards. This allows us to use the
+ general-purpose sort function. */
+ for (p = start+1; p < finish; p++) {
+ int n = (void *)p - (void *)start;
+ p->addr += n;
+ p->cont += n;
+ }
+ sort(start, finish - start, sizeof(struct exception_table_entry),
+ compare_entries);
+ for (p = start+1; p < finish; p++) {
+ int n = (void *)p - (void *)start;
+ p->addr -= n;
+ p->cont -= n;
+ }

Switching to the generic sort probably isn't worth it in this case.

> There are a bunch of other potential users that sort parallel arrays
> a[] and b[] with keys in a[] that want this too.

Where?

Thanks,
--
Andreas Gruenbacher <agruen@xxxxxxx>
SUSE Labs, SUSE LINUX GMBH

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/