Re: (elist) faster hash list scanning

Ingo Molnar (mingo@chiara.csoma.elte.hu)
Mon, 26 Jul 1999 14:32:06 +0200 (CEST)


On Mon, 26 Jul 1999, Jan Bobrowski wrote:

> > > * It speeds up list manipulations because next->pprev is always accessible.
> > such are list.h lists ...
>
> But next/pprev lists alredy available in the kernel require NULL pointer
> checking during adding/deleting node.

(but some of the code you converted, the inode hash, wasnt using
next/pprev lists.) next/pprev lists are obsolete, and in 2.3 we have
already replaced a couple of crutial lists with list.h lists. (waitqueues,
runqueue, files, etc.) So the proper comparison is list.h list_heads vs.
elists_heads - you have partly compared to next/pprev lists when listing
advantages - thats confusing i think.

> > > * They are anchored by single pointer - it saves 50% of memory
> > > * occupied by hashtables.
> > [...] This presumes that the list-user never tries to really look up
> > anchor->prev. There are several usage types which want to append to the
> > end of the queue. Eg. waitqueues [...]
>
> This is alternative implementation. It may be useful in _some_ situations.

yep, agreed.

> > using elist.h lists removes the possibility of later changing
> > add-characteristics.
>
> It's very easy to change back. Macros are almost identical (simply
> remove 'e' from the name...).

ok.

> > Also, the effect on SMP systems is interesting as well - we might end up
> > writing to __elist_end.pprev from many CPUs - causing that cacheline to
> > bounce around.
>
> Yes! It may be a problem. Note however that we don't use 'lock' opcode so
> I'm not sure that it's real problem (???).
> If those lists will be used for single purpose eg. vfs, they will be
> guarded by spinlock and all caching problems will be anyway.

no, the fact that something is protected by a spinlock doesnt
automatically avoid cacheline ping-pong. Even if it's nonatomic write-only
access, most SMP cache protocols cannot keep shared modified copies of
cachelines. [shared write doesnt make sense normally - the above case is
an exception]

> > This can be solved if __elist_end is properly aligned and
> > indexed by smp_processor_id() but then we introduce additional complexity
> > to elist_find(), on_elist() and init_*elist().
>
> Yes. But per-processor copy at the same address will be better.
> Having private processor data at the same place for all processors may
> by useful for other purposses too (??).

that is not possible, think about clone(CLONE_VM).

> > > + fast list scanning using elist_find() function.
> > > It is about 75% faster than original for long lists (5-element).
> > and how did you time this 75% speedup? list.h is very simple already and
> > the finding functions did about what elist_find does.
>
> I've extracted list_head code and tested outside kernel. This speedup is
> observed for long lists (5 element is long), but will not slow down when
> list has one element. (note that it's list scanning benchmark, not whole
> kernel). My lists are looped at ELIST_END so list termination can be
> checked every second time; it's imposible using list_head structures.

(could you post this benchmark too?)

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/