Re: Asynch I/O gets easily overloaded on 2.2.15 and 2.3.99

From: Andi Kleen (ak@suse.de)
Date: Tue Apr 11 2000 - 09:36:01 EST


On Tue, Apr 11, 2000 at 04:16:36PM +0200, Andrea Arcangeli wrote:
> On Tue, 11 Apr 2000, Andi Kleen wrote:
>
> >I was more thinking about lots of wakeups in get_request_wait causing
> >the elevator to do much work (kupdate is single threaded so there are
> >only a few wakeups). With lots of threads calling ll_rw_block in
> >parallel it may look differently.
>
> With the previous elevator code in 2.3.5x if there wasn't available
> requests you are right, the revalidation was quite expensive. However with
> the 2.3.99-prex I fixed that and now the only slowdown we have in the
> wait_for_request case (except the wait_for_request itself of course :) is
> this:
>
> /* revalidate elevator */
> head = &q->queue_head;
> if (q->head_active && !q->plugged)
> head = head->next;
>
> and that's very fast indeed and certainly not visible in any number.

How about the loop in get_request ? It looks rather inefficient.
Wouldn't it be better to have a free list of requests ?

> However the __sti() is only a few asm instruction before returning from
> ll_rw_block. So depending on the details of the architecture the profiling
> could even hit outside the I/O layer.

AFAIK interrupts near always run as soon as they are unblocked.

>
> >> Anyway I'm fairly confident that the profiler will show the real culprit
> >> (I guess Jeff is queueing into the buffer hashtable an insane number of
> >> buffers and that is causing complexity troubles due too much collisions).
> >> If that's the case you'll see you'll see an huge number in the
> >> get_hash_table entry in the profiling.
> >>
> >> Also last time I checked the buffer hash was been shrunk because in 2.3.x
> >> the buffer cache isn't used for the data write I/O but the raw devices can
> >> still be used to read/write without a filesystem...
> >
> >Good point. inode hash is too big, buffer hash is too small ...
>
> ihash is 16k buckets and the icache can grow around 16k easily on machines
> with good amount of memory. It should be made dynamic though.

According to http://www.citi.umich.edu/projects/linux-scalability/reports/hash.html
the inode hash is too big for a lot of workloads.

The main problem with the dentry and inode hashes is that they waste twice
the memory they should (anchored list heads are *very* wasteful for hash
tables). It is rather unlikely that the cost of the few saved if()s in the
list macros weight up for the additional cache misses [patch will come
as soon as I have tested it a bit more]

If you look at nm --size-sort -t d vmlinux you'll see that dentry
and inode hash are the biggest static memory wasters.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:16 EST