Andrea Arcangeli wrote:
> > The swapstorm I agree is uninteresting. The slowdown with a heavy write
> > load impacts a very common usage, and I've told you how to mostly fix
> > it. You need to back out the change to bdflush.
> I guess i should drop the run_task_queue(&tq_disk) instead of replacing
> it back with a wait_for_some_buffers().
hum. Nope, it definitely wants the wait_for_locked_buffers() in there.
36 seconds versus 25. (21 on stock kernel)
My theory is that balance_dirty() is directing heaps of wakeups
to bdflush, so bdflush just keeps on running. I'll take a look
(If we're sending that many wakeups, we should do a waitqueue_active
test in wakeup_bdflush...)
> Note that the first elevator (not elevator_linus) could handle this
> case, however it was too complicated and I'm been told it was hurting
> too much the performance of things like dbench etc.. But it was allowing
> you to take a few seconds for your test number 2 for example. Quite
> frankly all my benchmark were latency oriented, but I couldn't notice
> an huge drop of performance, but OTOH at that time my test box had a
> 10mbyte/sec HD, and I know for experience that on such HD numbers tends
> to be very different than on fast SCSI and my current test hd IDE
> 33mbyte/sec so I think they were right.
OK, well I think I'll make it so the feature defaults to "off" - no
change in behaviour. People need to run `elvtune -b non-zero-value'
to turn it on.
So what is then needed is testing to determine the latency-versus-throughput
tradeoff. Andries takes manpage patches :)
> > - Your attempt to address read latencies didn't work out, and should
> > be dropped (hopefully Marcelo and Jens are OK with an elevator hack :))
> It should not be dropped. And it's not an hack, I only enabled the code
> that was basically disabled due the huge numbers. It will work as 2.2.20.
Sorry, I was referring to the elevator-bypass patch. Jens called
it a hack ;)
> Now what you want to add is an hack to move the read at the top of the
> request_queue and if you go back to 2.3.5x you'll see I was doing this,
> that's the first thing I did while playing with the elevator. And
> latency-wise it was working great. I'm sure somebody remebers the kind
> of latency you could get with such an elevator.
> Then I got flames from Linus and Ingo claiming that I screwedup the
> elevator and that I was the source of the 2.3.x bad I/O performance and
> so they required to nearly rewrite the elevator in a way that was
> obvious that couldn't hurt the benchmarks and so Jens dropped part of my
> latency-capable elevator and he did the elevator_linus that of course
> cannot hurt performance of benchmarks, but that has the usual problem
> you need to wait 1 minute for xterm to be stared under a write flood.
> However my object was to avoid nearly infinite starvation and the
> elevator_linus avoids it (you can start the xterm it in 1 minute,
> previously in early 2.3 and 2.2 you'd need to wait for the disk to be
> full, and that could take some day with some terabyte of data). So I was
> pretty much fine with elevator_linus too but we very well known reads
> would be starved again significantly (even if not indefinitely).
As long as the elevator-bypass tunable gives a good range of
latency-versus-throughput tuning then I'll be happy. It's a
bit sad that in even the best case, reads are penalised by a
factor of ten when there are writes happening.
But fixing that would require major readahead surgery, and perhaps
implementation of anticipatory scheduling, as described in
which is out of scope.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:23 EST