[RFC V2] block: reject I/O for same fd if block size changed
From: Minwoo Im
Date: Wed Dec 30 2020 - 11:02:32 EST
Let's say, for example of NVMe device, Format command to change out
LBA format to another logical block size and BLKRRPART to re-read
partition information with a same file descriptor like:
fd = open("/dev/nvme0n1", O_RDONLY);
nvme_format(fd, ...);
if (ioctl(fd, BLKRRPART) < 0)
..
In this case, ioctl causes invalid Read operations which are triggered
by buffer_head I/O path to re-read partition information. This is
because it's still playing around with i_blksize and i_blkbits. So,
512 -> 4096 -> 512 logical block size changes will cause an under-flowed
length of Read operations.
Case for NVMe:
(LBAF 1 512B, LBAF 0 4K logical block size)
nvme format /dev/nvme0n1 --lbaf=1 --force # to 512B LBA
nvme format /dev/nvme0n1 --lbaf=0 --force # to 4096B LBA
[dmesg-snip]
[ 10.771740] blk_update_request: operation not supported error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 10.780262] Buffer I/O error on dev nvme0n1, logical block 0, async page read
[event-snip]
kworker/0:1H-56 [000] .... 913.456922: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=1, cmdid=216, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=65535, ctrl=0x0, dsmgmt=0, reftag=0)
ksoftirqd/0-9 [000] .Ns. 916.566351: nvme_complete_rq: nvme0: disk=nvme0n1, qid=1, cmdid=216, res=0x0, retries=0, flags=0x0, status=0x4002
As the previous discussion [1], this patch introduced a gendisk flag
to indicate that block size has been changed in the runtime. This flag
is set when logical block size is changed in the runtime in the block
layer. It will be cleared when the file descriptor for the
block devie is opened again through __blkdev_get() which updates the block
size via set_init_blocksize().
This patch rejects I/O from the path of add_partitions() to avoid
issuing invalid Read operations to device. It also sets a flag to
gendisk in blk_queue_logical_block_size to minimize caller-side updates.
[1] https://lore.kernel.org/linux-nvme/20201223183143.GB13354@localhost.localdomain/T/#t
Signed-off-by: Minwoo Im <minwoo.im.dev@xxxxxxxxx>
---
block/blk-settings.c | 7 +++++++
block/genhd.c | 1 +
block/partitions/core.c | 11 +++++++++++
fs/block_dev.c | 6 ++++++
include/linux/genhd.h | 1 +
5 files changed, 26 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 43990b1d148b..205822ffcaef 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -328,6 +328,13 @@ EXPORT_SYMBOL(blk_queue_max_segment_size);
void blk_queue_logical_block_size(struct request_queue *q, unsigned int size)
{
struct queue_limits *limits = &q->limits;
+ struct block_device *bdev;
+
+ if (q->backing_dev_info && q->backing_dev_info->owner &&
+ limits->logical_block_size != size) {
+ bdev = blkdev_get_no_open(q->backing_dev_info->owner->devt);
+ bdev->bd_disk->flags |= GENHD_FL_BLOCK_SIZE_CHANGED;
+ }
limits->logical_block_size = size;
diff --git a/block/genhd.c b/block/genhd.c
index 73faec438e49..c3a73cba7c88 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -81,6 +81,7 @@ bool set_capacity_and_notify(struct gendisk *disk, sector_t size)
*/
if (!capacity || !size)
return false;
+
kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp);
return true;
}
diff --git a/block/partitions/core.c b/block/partitions/core.c
index e7d776db803b..5a0330c1b6f9 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -618,6 +618,17 @@ int blk_add_partitions(struct gendisk *disk, struct block_device *bdev)
if (!disk_part_scan_enabled(disk))
return 0;
+ /*
+ * Reject to check partition information if block size has been changed
+ * in the runtime. If block size of a block device has been changed,
+ * the file descriptor should be opened agian to update the blkbits.
+ */
+ if (disk->flags & GENHD_FL_BLOCK_SIZE_CHANGED) {
+ pr_warn("%s: rejecting checking partition. fd should be opened again.\n",
+ disk->disk_name);
+ return -EBADFD;
+ }
+
state = check_partition(disk, bdev);
if (!state)
return 0;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9293045e128c..c996de3d6084 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -131,6 +131,12 @@ EXPORT_SYMBOL(truncate_bdev_range);
static void set_init_blocksize(struct block_device *bdev)
{
bdev->bd_inode->i_blkbits = blksize_bits(bdev_logical_block_size(bdev));
+
+ /*
+ * Allow I/O commands for this block device. We can say that this
+ * block device has been set to a proper block size.
+ */
+ bdev->bd_disk->flags &= ~GENHD_FL_BLOCK_SIZE_CHANGED;
}
int set_blocksize(struct block_device *bdev, int size)
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 809aaa32d53c..0e0e24917003 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -103,6 +103,7 @@ struct partition_meta_info {
#define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE 0x0100
#define GENHD_FL_NO_PART_SCAN 0x0200
#define GENHD_FL_HIDDEN 0x0400
+#define GENHD_FL_BLOCK_SIZE_CHANGED 0x0800
enum {
DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */
--
2.17.1