Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme

From: David Woodhouse
Date: Thu Apr 25 2019 - 01:45:44 EST


On Wed, 2019-04-24 at 13:58 -0700, Sagi Grimberg wrote:
> > It isn't that the media is slow; the max timeout is based on the SLA
> > for certain classes of "fabric" outages. Linux copes *really* badly
> > with I/O errors, and if we can make the timeout last long enough to
> > cover the switch restart worst case, then users are a lot happier.
>
> Well, what is usually done to handle fabric outages is having multiple
> paths to the storage device, not sure if that is applicable for you or
> not...

Yeah, that turns out to be impractical in this case.

> What do you mean by "Linux copes *really* badly with I/O errors"? What
> can be done better?

There's not a lot that can be done here in the short term. If file
systems get errors on certain I/O, then graceful recovery would be
complicated to achieve.

Better for the I/O timeout to be set higher than the known worst case
time for successful completion.

Attachment: smime.p7s
Description: S/MIME cryptographic signature