Re: per-cpu blk_plug_list

From: Andrew Morton
Date: Mon Mar 01 2004 - 17:19:07 EST


"Chen, Kenneth W" <kenneth.w.chen@xxxxxxxxx> wrote:
>
> We were surprised to find global lock in I/O path still exists on 2.6
> kernel.

Well I'm surprised that nobody has complained about it yet ;)

Disks are slow. 100 requests per second, perhaps. So a single lock is
generally OK until you have a vast number of disks and lots of
contention-prone CPUs.

Yes, we should fix this.

> blk_plug_list/blk_plug_lock manages plug/unplug action. When you have
> lots of cpu simultaneously submits I/O, there are lots of movement with
> the device queue on and off that global list. Our measurement showed
> that blk_plug_lock contention prevents linux-2.6.3 kernel to scale pass
> beyond 40 thousand I/O per second in the I/O submit path.

Looking at the guts of your patch:

/*
* blk_run_queues_cpu - fire all plugged queues on this cpu
*/
#define blk_plug_entry(entry) list_entry((entry), request_queue_t, plug_list)
void blk_run_queues_cpu(int cpu)
{
struct blk_plug_struct * cur_blk_plug = &blk_plug_array[cpu];
struct list_head * head = &cur_blk_plug->blk_plug_head;
spinlock_t *lock = &cur_blk_plug->blk_plug_lock;

spin_lock_irq(lock);
while (!list_empty(head)) {
request_queue_t *q = blk_plug_entry(head->next);
spin_unlock_irq(lock);
q->unplug_fn(q);
spin_lock_irq(lock);
}
spin_unlock_irq(lock);
}

void blk_run_queues(void)
{
int i;

for_each_cpu(i)
blk_run_queues_cpu(i);
}

How on earth do you know that when direct-io calls blk_run_queues_cpu(), it
is still running on the cpu which initially plugged the queue(s)?

And your patch might have made the much-more-frequently-called
blk_run_queues() more expensive.

There are minor issues with lack of preemption safety and not using the
percpu data infrastructure, but they can wait.

You haven't told us how many disks are in use? At 100k IO's per second it
would be 500 to 1000 disks, yes?

I assume you tried it, but does this help?

diff -puN drivers/block/ll_rw_blk.c~a drivers/block/ll_rw_blk.c
--- 25/drivers/block/ll_rw_blk.c~a Mon Mar 1 13:52:37 2004
+++ 25-akpm/drivers/block/ll_rw_blk.c Mon Mar 1 13:52:45 2004
@@ -1251,6 +1251,9 @@ void blk_run_queues(void)
{
LIST_HEAD(local_plug_list);

+ if (list_empty(&blk_plug_list))
+ return;
+
spin_lock_irq(&blk_plug_lock);

/*

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/