I doubt that it is that bad. It needs one additional cache line to access
the bottom of the stack.
Assuming you allocate the task_struct with the slab allocator which does
proper L1 colouring and align on the most often used variable offset inside
task_struct then for most processes this one cache line is saved again
[the "fixed" alignment address at the bottom of the stack is bad for cache line
selection on not fully associative caches]. With some reordering in
task_struct it probably will be even faster.
Of course this is all speculation without benchmarks, I'll run some tests
soon.
>
> A 4k stack is not quite enough. it should be, but in RL it was pain and
> caused way too common 'ayiee, stack corrupted ..' crashes. The current 7k
> stack seems to be just about perfect, enough for all drivers/filesystems,
> and still doesnt waste a full second page, because we have the task
> structure there.
That would be a killer of course. I was assuming that most of them
were caused by "single offenders" like the aic7xxx driver putting lots
of internal state on the stack - these could be fixed. If it is a more
generic situation then it is not a good way.
>
> put yet another way, if we cannot allocate 8k pages more or less reliably,
> we are dead for NFS serving anyway ... and if we cannot allocate 8k pages
> in fork() [which can wait, unlike the IRQ handler] we are doing way too
> bad work too :(
The NFS server can be easily fixed here - fork only with lots of black
magic in the VM. Also it is reasonable to require lets say minimum 16MB
of memory for a nfs server, while fork() is a much more fundamental
operation that needs to work with minimum RAM reliable.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html