[RFC PATCH] fs: don't flush pagecahce when expanding block device

From: Shunki Fujita
Date: Tue Sep 19 2017 - 21:44:09 EST


I have a trouble with my service and have two questions and an RFC patch.

I run a web service as follows.

- Its data is on a multipath device which size is sometimes grown.
- It uses many page caches
- It's response time is usually about 0.2 s.

When I grow the multipath device, kernel flushes this device's all caches
and the response time becomes 20 s, 100 times slower than usual.

IIUC, this flushing seems not to be necessary because of the reason described later.
However, looking at the comments of the past commit, it seems that the buffer cache
is intentionally flushed not only in shrinking case but also growing case.
(cf. https://github.com/torvalds/linux/commit/608aeef17a91747d6303de4df5e2c2e6899a95e8)

I have two questions about this:
1. I interpret that the following situation is concerned, is my understanding correct?

<Situation>
On a 10 GB device, if calls occur as shown in the table below that CPU0 shrink device to 5 GB and CPU1 grows device to 15 GB,
struct gendisk and struct block_device are already the same size at (*).
So in this case, the cache is not flushed in shrinking.
(cf. https://github.com/torvalds/linux/blob/608aeef17a91747d6303de4df5e2c2e6899a95e8/fs/block_dev.c#L897)

CPU0 (shrink) CPU1 (grow)
=============================
set_capacity()
set_capacity()
revalidate_disk()
revalidate_disk() ...(*)


2. If 1 is yes, the above mentioned situation seems not to happen since all pairs of set_capacity()
and revalidate_disk() are protected by some kind of lock mechanism. If it's correct, how about
avoiding the performance problem which I mentioned by the following patch?

Thanks,
Shunki
---
It's not necessary to flush caches about a device which is under growing.

Signed-off-by: Shunki Fujita <shunki-fujita@xxxxxxxxxxxx>
---
fs/block_dev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 44d4a1e..d17603c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1078,7 +1078,14 @@ void check_disk_size_change(struct gendisk *disk, struct block_device *bdev)
"%s: detected capacity change from %lld to %lld\n",
name, bdev_size, disk_size);
i_size_write(bdev->bd_inode, disk_size);
- flush_disk(bdev, false);
+ if (bdev_size > disk_size) {
+ flush_disk(bdev, false);
+ } else {
+ if (!bdev->bd_disk)
+ return;
+ if (disk_part_scan_enabled(bdev->bd_disk))
+ bdev->bd_invalidated = 1;
+ }
}
}
EXPORT_SYMBOL(check_disk_size_change);
--
2.7.4