Re: [PATCH 01/20] block, blk_filter: enable block device filters

From: Christoph Hellwig
Date: Wed Jul 13 2022 - 07:56:14 EST


On Fri, Jul 08, 2022 at 12:45:33PM +0200, Sergei Shtepa wrote:
> 1. Work at the partition or disk level?
> At the user level, programs operate with block devices.
> In fact, the "disk" entity makes sense only for the kernel level.
> When the user chooses which block devices to backup and which not,
> he operates with mounting points, which are converted into block
> devices, partitions. Therefore, it is better to handle bio before
> remapping to disk.
> If the filtering is performed after remapping, then we will be
> forced to apply a filter to the entire disk, or complicate the
> filtering algorithm by calculating which range of sectors bio is
> addressed to. And if bio is addressed to the partition boundary...
> Filtering at the block device level seems to me a simpler solution.
> But this is not the biggest problem.

Note that bi_bdev stays for the partition things came from. So we
could still do filtering after blk_partition_remap has been called,
the filter driver just needs to be careful on how to interpret the
sector numbers.

> 2. Can the filter sleep or postpone bio processing to the worker thread?

I think all of te above is fine, just for normal submit_bio based
drivers.

> The problem is in the implementation of the COW algorithm.
> If I send a bio to read a chunk (one bio), and then pass a write bio,
> then with some probability I am reading partially overwritten data.
> Writing overtakes reading. And flags REQ_SYNC and REQ_PREFLUSH don't help.
> Maybe it's a disk driver issue, or a hypervisor, or a NAS, or a RAID,
> or maybe normal behavior. I don't know. Although, maybe I'm not working
> correctly with flags. I have seen the comments on patch 11/20, but I am
> not sure that the fixes will solve this problem.
> But because of this, I have to postpone the write until the read completes.

In the I/O stack there really isn't any ordering. While a general
reordering looks a bit odd to be, it absolutely it always possible.

> 2.1 The easiest way to solve the problem is to block the writer's thread
> with a semaphore. And for bio with a flag REQ_NOWAIT, complete processing
> with bio_wouldblock_error(). This is the solution currently being used.

This sounds ok. The other option would be to put the write on hold and
only queue it up from the read completion (or rather a workqueue kicked
off from the read completion). But this is basically the same, just
without blocking the I/O submitter, so we could do the semaphore first
and optimize later as needed.

> If I am blocked by the q->q_usage_counter counter, then I will not
> be able to execute COW in the context of the current thread due to deadlocks.
> I will have to use a scheme with an additional worker thread.
> Bio filtering will become much more complicated.

q_usage_counter itself doesn't really block you from doing anything.
You can still sleep inside of it, and most driver do that.