RE: scsi-mq V2

From: Elliott, Robert (Server Storage)
Date: Sat Jul 12 2014 - 17:53:17 EST

> -----Original Message-----
> From: Benjamin LaHaise [mailto:bcrl@xxxxxxxxx]
> Sent: Friday, 11 July, 2014 9:55 AM
> To: Elliott, Robert (Server Storage)
> Cc: Christoph Hellwig; Jeff Moyer; Jens Axboe; dgilbert@xxxxxxxxxxxx; James
> Bottomley; Bart Van Assche; linux-scsi@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: scsi-mq V2
> Can you try the below totally untested patch instead? It looks like
> put_reqs_available() is not irq-safe.

With that addition alone, fio still runs into the same problem.

I added the same fix to get_reqs_available, which also accesses
kcpu->reqs_available, and the test has run for 35 minutes with
no problem.

Patch applied:

diff --git a/fs/aio.c b/fs/aio.c
index e59bba8..8e85e26 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -830,16 +830,20 @@ void exit_aio(struct mm_struct *mm)
static void put_reqs_available(struct kioctx *ctx, unsigned nr)
struct kioctx_cpu *kcpu;
+ unsigned long flags;

kcpu = this_cpu_ptr(ctx->cpu);

+ local_irq_save(flags);
kcpu->reqs_available += nr;
while (kcpu->reqs_available >= ctx->req_batch * 2) {
kcpu->reqs_available -= ctx->req_batch;
atomic_add(ctx->req_batch, &ctx->reqs_available);

+ local_irq_restore(flags);

@@ -847,10 +851,12 @@ static bool get_reqs_available(struct kioctx *ctx)
struct kioctx_cpu *kcpu;
bool ret = false;
+ unsigned long flags;

kcpu = this_cpu_ptr(ctx->cpu);

+ local_irq_save(flags);
if (!kcpu->reqs_available) {
int old, avail = atomic_read(&ctx->reqs_available);

@@ -869,6 +875,7 @@ static bool get_reqs_available(struct kioctx *ctx)
ret = true;
+ local_irq_restore(flags);
return ret;

I will see if that solves the problem with the scsi-mq-3 tree, or
at least some of the bisect trees leading up to it.

A few other comments:

1. Those changes boost _raw_spin_lock_irqsave into first place
in perf top:

6.59% [kernel] [k] _raw_spin_lock_irqsave
4.37% [kernel] [k] put_compound_page
2.87% [scsi_debug] [k] sdebug_q_cmd_hrt_complete
2.74% [kernel] [k] _raw_spin_lock
2.73% [kernel] [k] apic_timer_interrupt
2.41% [kernel] [k] do_blockdev_direct_IO
2.24% [kernel] [k] __get_page_tail
1.97% [kernel] [k] _raw_spin_unlock_irqrestore
1.87% [kernel] [k] scsi_queue_rq
1.76% [scsi_debug] [k] schedule_resp

Maybe (later) kcpu->reqs_available should converted to an atomic,
like ctx->reqs_available, to reduce that overhead?

2. After the f8567a3 patch, aio_complete has one early return that
bypasses the call to put_reqs_available. Is that OK, or does
that mean that sync iocbs will now eat up reqs_available?

* Special case handling for sync iocbs:
* - events go directly into the iocb for fast handling
* - the sync task with the iocb in its stack holds the single iocb
* ref, no other paths have a way to get another ref
* - the sync task helpfully left a reference to itself in the iocb
if (is_sync_kiocb(iocb)) {
iocb->ki_user_data = res;
iocb->ki_ctx = ERR_PTR(-EXDEV);

3. The f8567a3 patch renders this comment in aio.c out of date - it's
no longer incremented when pulled off the ringbuffer, but is now
incremented when aio_complete is called.

struct {
* This counts the number of available slots in the ringbuffer,
* so we avoid overflowing it: it's decremented (if positive)
* when allocating a kiocb and incremented when the resulting
* io_event is pulled off the ringbuffer.
* We batch accesses to it with a percpu version.
atomic_t reqs_available;
} ____cacheline_aligned_in_smp;

Rob Elliott HP Server Storage

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at