Re: [PATCH] Do not silently discard WRITE_SAME requests
From: Petr Vandrovec
Date: Sun Oct 12 2014 - 02:18:15 EST
This is a multi-part message in MIME format.
On 10/11/2014 5:51 AM, Martin K. Petersen wrote:
"Petr" == Petr Vandrovec <petr@xxxxxxxxxx> writes:
Petr> After investigating, problem seems to be in a way completion
Petr> handler for WRITE_SAME handles EOPNOTSUPP error, causing
Petr> all-but-first WRITE_SAME request on the LVM device to be silently
Petr> ignored - command is never issued, but success is returned to
Petr> higher layers.
Commit 7eee4ae2dbb2 was meant to address this issue. Does it still
happen with that in place?
Hi,
that commit alleviates need for change to blk-lib.c. But I believe
that change to blk-core.c that changes return value from EOPNOTSUPP to
EREMOTEIO is still necessary - unless I'm missing some locking
somewhere, there is a race in blkdev_issue_write_same() wrt. updating
max_write_same_sectors:
blkdev_issue_write_same() checks whether max_write_same_sectors is
non-zero at the beginning, and if it is non-zero it proceeds with
generating BIOs. While it generates them, other thread seems to be able
to complete previously issued write_same, find it is not supported, and
clear max_write_same_sectors. Which means that BIOs that are now being
generated will fast-fail in blk-core.c with EOPNOTSUPP, and
blkdev_issue_write_same() will then return success, rather than failure.
It is true that now WRITE_SAME is failing only if second WRITE_SAME is
issued to the device while first ever issued WRITE_SAME on the device is
being completed, but I see no reason why to not close this race.
Logic (from 2011, commit 8af1954d172a46a63e5e79dae523a6d74715e458) says
that EOPNOTSUPP is returned when DISCARD request failed, as discarding
is optional, and failures can be safely ignored. That is definitely not
true for WRITE_SAME failures, and so unsupported WRITE_SAME should
return different error code than unsupported DISCARD.
Which is what patch does. I've removed part that propagates disabling
WRITE_SAME from the diff, keeping only EOPNOTSUPP => EREMOTEIO change,
and revert of blacklisting VMware's LSI (if anything, blacklist should
be for current firmware version of 'VMware Virtual SCSI Disk', as f.e.
passed-through (RDM) SCSI disks do support WRITE_SAME under VMware) --
see attached updated diff.
Petr Vandrovec