RE: scsi-mq V2

From: Elliott, Robert (Server Storage)
Date: Sat Jul 12 2014 - 17:53:17 EST




> -----Original Message-----
> From: Benjamin LaHaise [mailto:bcrl@xxxxxxxxx]
> Sent: Friday, 11 July, 2014 9:55 AM
> To: Elliott, Robert (Server Storage)
> Cc: Christoph Hellwig; Jeff Moyer; Jens Axboe; dgilbert@xxxxxxxxxxxx; James
> Bottomley; Bart Van Assche; linux-scsi@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: scsi-mq V2
...
> Can you try the below totally untested patch instead? It looks like
> put_reqs_available() is not irq-safe.
>

With that addition alone, fio still runs into the same problem.

I added the same fix to get_reqs_available, which also accesses
kcpu->reqs_available, and the test has run for 35 minutes with
no problem.

Patch applied:

diff --git a/fs/aio.c b/fs/aio.c
index e59bba8..8e85e26 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -830,16 +830,20 @@ void exit_aio(struct mm_struct *mm)
static void put_reqs_available(struct kioctx *ctx, unsigned nr)
{
struct kioctx_cpu *kcpu;
+ unsigned long flags;

preempt_disable();
kcpu = this_cpu_ptr(ctx->cpu);

+ local_irq_save(flags);
kcpu->reqs_available += nr;
+
while (kcpu->reqs_available >= ctx->req_batch * 2) {
kcpu->reqs_available -= ctx->req_batch;
atomic_add(ctx->req_batch, &ctx->reqs_available);
}

+ local_irq_restore(flags);
preempt_enable();
}

@@ -847,10 +851,12 @@ static bool get_reqs_available(struct kioctx *ctx)
{
struct kioctx_cpu *kcpu;
bool ret = false;
+ unsigned long flags;

preempt_disable();
kcpu = this_cpu_ptr(ctx->cpu);

+ local_irq_save(flags);
if (!kcpu->reqs_available) {
int old, avail = atomic_read(&ctx->reqs_available);

@@ -869,6 +875,7 @@ static bool get_reqs_available(struct kioctx *ctx)
ret = true;
kcpu->reqs_available--;
out:
+ local_irq_restore(flags);
preempt_enable();
return ret;
}

--
I will see if that solves the problem with the scsi-mq-3 tree, or
at least some of the bisect trees leading up to it.

A few other comments:

1. Those changes boost _raw_spin_lock_irqsave into first place
in perf top:

6.59% [kernel] [k] _raw_spin_lock_irqsave
4.37% [kernel] [k] put_compound_page
2.87% [scsi_debug] [k] sdebug_q_cmd_hrt_complete
2.74% [kernel] [k] _raw_spin_lock
2.73% [kernel] [k] apic_timer_interrupt
2.41% [kernel] [k] do_blockdev_direct_IO
2.24% [kernel] [k] __get_page_tail
1.97% [kernel] [k] _raw_spin_unlock_irqrestore
1.87% [kernel] [k] scsi_queue_rq
1.76% [scsi_debug] [k] schedule_resp

Maybe (later) kcpu->reqs_available should converted to an atomic,
like ctx->reqs_available, to reduce that overhead?

2. After the f8567a3 patch, aio_complete has one early return that
bypasses the call to put_reqs_available. Is that OK, or does
that mean that sync iocbs will now eat up reqs_available?

/*
* Special case handling for sync iocbs:
* - events go directly into the iocb for fast handling
* - the sync task with the iocb in its stack holds the single iocb
* ref, no other paths have a way to get another ref
* - the sync task helpfully left a reference to itself in the iocb
*/
if (is_sync_kiocb(iocb)) {
iocb->ki_user_data = res;
smp_wmb();
iocb->ki_ctx = ERR_PTR(-EXDEV);
wake_up_process(iocb->ki_obj.tsk);
return;
}


3. The f8567a3 patch renders this comment in aio.c out of date - it's
no longer incremented when pulled off the ringbuffer, but is now
incremented when aio_complete is called.

struct {
/*
* This counts the number of available slots in the ringbuffer,
* so we avoid overflowing it: it's decremented (if positive)
* when allocating a kiocb and incremented when the resulting
* io_event is pulled off the ringbuffer.
*
* We batch accesses to it with a percpu version.
*/
atomic_t reqs_available;
} ____cacheline_aligned_in_smp;


---
Rob Elliott HP Server Storage



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/