Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()

From: John Garry
Date: Thu Feb 18 2016 - 05:13:47 EST


On 18/02/2016 07:40, Hannes Reinecke wrote:
On 02/16/2016 05:56 PM, John Garry wrote:
On 16/02/2016 15:33, Hannes Reinecke wrote:
On 02/16/2016 01:22 PM, John Garry wrote:
In high-datarate aging tests, it is found that
the SCSI framework can periodically
issue lu resets to the device. This is because scsi
commands begin to timeout. It is found that TASK SET
FULL may be returned many times for the same command,
causing the timeouts.
To overcome this, the queue depth for the device needs
to be reduced to 64 (from 256, set in
sas_slave_configure()).

Hmm. TASK SET FULL should cause the queue depth to be reduced
automatically, no?

Cheers,

Hannes


I need to double-check if Task set full reduces the depth, I don't
think it does.

Regardless I found we were getting a combination of commands being
retried due to Task Set Full and also SAS_QUEUE_FULL errors. For
sure the SAS_QUEUE_FULL task errors reduce the queue depth in
scsi_track_queue_full(). However I found it to be very slow in
tracking, and we were getting commands timing out before the queue
depth fell enough.

It would be nice to change default queue depth in
sas_slave_configure() to a lower value so we can avoid this patch,
but I am not sure if other vendor's HBA performance would be
affected. From looking at the history of sas_slave_configure(), it
would seem the value of 256 was inherited from mpt2sas driver, so
I'm not sure.

Well, the classical thing would be to associate each request tag
with a SAS task; or, in your case, associate each slot index with a
request tag.
You probably would need to reserve some slots for TMFs, ie you'd
need to decrease the resulting ->can_queue variable by that.
But once you've done that you shouldn't hit any QUEUE_FULL issues,
as the block layer will ensure that no tags will be reused while the
command is in flight.
Plus this is something you really need to be doing if you ever
consider moving to scsi-mq ...

Cheers,

Hannes

Hi,

So would you recommend this method under the assumption that the can_queue value for the host is similar to the queue depth for the device?

Regards,
John