Re: SCO: "thread creation is about a thousand times faster than on native Linux"

From: kumon@flab.fujitsu.co.jp
Date: Thu Aug 24 2000 - 03:27:09 EST


Andi Kleen writes:
> The main problem Linux clone has over LWPs is the extensive memory resource
> usage (~8.5K on x86, 16+somethingK on 64bit for the kernel stacks)

The performance problem exists in the kernel stack alignment, which
needs 2K boundary on x86.

The scheduler scans the stacks of all runnable processes. If you have
1000 processes, stack dereferences all cause cache misses.
Because 2MB CPU cache has only a thousand 2K boundaries.

But under the actual circumstances, more than a handred of runnable
processes will cause noticable cache misses and overhead increase in
schedule().

Because the cache is also used for other data (including user and
kernel) and those data may replace the stack data on a cache.

Successive dereference of stack-chain often shows worst case senario
of LRU cache replacement policy: reading a stack will swap out the
stack which will be read the next and so on.

This is a kind of performance problem in a large SMP system, which
usually runs more processes than a SMP system.

To reduce the overhead, it is preferable to split the process-list
into several lists each belongs to a CPU. This also reduce the
current quadratic scheduring overhead also.

Any idea?

--
Computer Systems Laboratory, Fujitsu Labs.
kumon@flab.fujitsu.co.jp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:12 EST