blk_congestion_wait racy?

From: Martin Schwidefsky
Date: Mon Mar 08 2004 - 05:00:56 EST


Hi,
we have a stupid little program that linearly allocates and touches
memory. We use this to see how fast s390 can swap. If this is combined
with the fastest block device we have (xpram) we see a very strange
effect:

2.6.4-rc2 with 1 cpu
# time ./mempig 600
Count (1Meg blocks) = 600
600 of 600
Done.

real 0m2.516s
user 0m0.150s
sys 0m0.570s
#

2.6.4-rc2 with 2 cpus
# time ./mempig 600
Count (1Meg blocks) = 600
600 of 600
Done.

real 0m56.086s
user 0m0.110s
sys 0m0.630s
#

I have the suspicion that the call to blk_congestion_wait in
try_to_free_pages is part of the problem. It initiates a wait for
a queue to exit congestion but this could already have happened
on another cpu before blk_congestion_wait has setup the wait
queue. In this case the process sleeps for 0.1 seconds. With
the swap test setup this happens all the time. If I "fix"
blk_congestion_wait not to wait:

diff -urN linux-2.6/drivers/block/ll_rw_blk.c linux-2.6-fix/drivers/block/ll_rw_blk.c
--- linux-2.6/drivers/block/ll_rw_blk.c Fri Mar 5 14:50:28 2004
+++ linux-2.6-fix/drivers/block/ll_rw_blk.c Fri Mar 5 14:51:05 2004
@@ -1892,7 +1892,9 @@

blk_run_queues();
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
+#if 0
io_schedule_timeout(timeout);
+#endif
finish_wait(wqh, &wait);
}

then the system reacts normal again:

2.6.4-rc2 + "fix" with 1 cpu
# time ./mempig 600
Count (1Meg blocks) = 600
600 of 600
Done.

real 0m2.523s
user 0m0.200s
sys 0m0.880s
#

2.6.4-rc2 + "fix" with 2 cpu
# time ./mempig 600
Count (1Meg blocks) = 600
600 of 600
Done.

real 0m2.029s
user 0m0.250s
sys 0m1.560s
#

2.6.4-rc2 + "fix" with 2 cpus


Since it isn't a solution to remove the call to io_schedule_timeout
I tried to understand what the event is, that blk_congestion_wait
is waiting for. The comment says it waits for a queue to exit congestion.
That is starting from prepare_to_wait it waits for a call to
clear_queue_congested. In my test scenario NO queue is congested on
enter to blk_congestion_wait. I'd like to see a proper wait_event
there but it is non-trivial to define the event to wait for.
Any useful hints ?

blue skies,
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/