RE: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout amodule parameter

From: KY Srinivasan
Date: Mon Jun 03 2013 - 20:23:36 EST




> -----Original Message-----
> From: James Bottomley [mailto:jbottomley@xxxxxxxxxxxxx]
> Sent: Monday, June 03, 2013 7:47 PM
> To: KY Srinivasan
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> devel@xxxxxxxxxxxxxxxxxxxxxx; ohering@xxxxxxxx; hch@xxxxxxxxxxxxx; linux-
> scsi@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout a module
> parameter
>
> On Mon, 2013-06-03 at 23:25 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: James Bottomley [mailto:jbottomley@xxxxxxxxxxxxx]
> > > Sent: Monday, June 03, 2013 7:03 PM
> > > To: KY Srinivasan
> > > Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > devel@xxxxxxxxxxxxxxxxxxxxxx; ohering@xxxxxxxx; hch@xxxxxxxxxxxxx; linux-
> > > scsi@xxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout a
> module
> > > parameter
> > >
> > > On Mon, 2013-06-03 at 16:21 -0700, K. Y. Srinivasan wrote:
> > > > The standard scsi timeout is not appropriate in some of the environments
> > > where
> > > > Hyper-V is deployed. Set this timeout appropriately for all devices managed
> > > > by this driver. Further make this a module parameter.
> > > >
> > > > Signed-off-by: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
> > > > Reviewed-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > > > ---
> > > > drivers/scsi/storvsc_drv.c | 9 +++++++++
> > > > 1 files changed, 9 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > > index 16a3a0c..8d29a95 100644
> > > > --- a/drivers/scsi/storvsc_drv.c
> > > > +++ b/drivers/scsi/storvsc_drv.c
> > > > @@ -221,6 +221,13 @@ static int storvsc_ringbuffer_size = (20 *
> PAGE_SIZE);
> > > > module_param(storvsc_ringbuffer_size, int, S_IRUGO);
> > > > MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
> > > >
> > > > +/*
> > > > + * Timeout in seconds for all devices managed by this driver.
> > > > + */
> > > > +static int storvsc_timeout = 180;
> > > > +module_param(storvsc_timeout, uint, (S_IRUGO | S_IWUSR));
> > > > +MODULE_PARM_DESC(storvsc_timeout, "Device timeout (seconds)");
> > > > +
> > > > #define STORVSC_MAX_IO_REQUESTS 128
> > > >
> > > > /*
> > > > @@ -1204,6 +1211,8 @@ static int storvsc_device_configure(struct
> scsi_device
> > > *sdevice)
> > > >
> > > > blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
> > > >
> > > > + blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout *
> > > HZ));
> > >
> > > Why does this need to be a module parameter? It's already a sysfs one
> > > in the scsi_device class? Three minutes is also a bit large. The
> > > default is 30s with huge cache arrays recommending upping this to
> > > 60s ... you're three times this.
> >
> > James,
> > This number was arrived at based on some testing that was done on the
> > cloud. On our cloud, we have a 120 second
> > timeouts that trigger broader VM level recovery and in cases where
> > there is storage access issues
> > (which is when we would hit this timeout), it will be better to defer
> > to the fabric level recovery than attempt
> > Scsi level recovery/retry. The default value chosen for devices
> > managed by storvsc should be just fine,
>
> So are you sure you want to set the command timeout to 3 minutes? ...
> it's an incredibly high value. The actual complete timeout is this
> value multiplied by the number of retries, which is 5 for disk devices,
> so you'll be waiting up to 15 minutes before we signal a failure in some
> circumstances. It sounds like you want the actual path length of error
> recovery to be on average 3 minutes.
>
> The value of the timeout should be a compromise between the longest time
> you want the user to wait for a failure and the longest time a device
> should take to respond.

This should be fine. Note that all error recovery/retry is happening on the host side and beyond
a certain delay, we will do a VM level recovery at the fabric level. On a slightly different note,
we have the same issue with the SCSI FLUSH timeout. Would you consider changing this.
>
> > I made it a module parameter to have more flexibility.
>
> It's *already* a sysfs parameter ... why do you want an additional
> module parameter? Multiple parameters for the same quantity, especially
> ones which can't be altered at runtime like module parameters, end up
> confusing users.

Agreed. I can send you a patch that would remove this parameter. Or, if you prefer
I could resend this set with the change to this patch (removing the module parameter).

Regards,

K. Y
>
> James
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/