Re: [PATCH RFC v2 00/14] md: introduce a new lockless bitmap

From: Yu Kuai
Date: Wed Apr 09 2025 - 05:27:38 EST


Hi,

在 2025/04/09 16:32, Christoph Hellwig 写道:
On Sat, Mar 29, 2025 at 09:11:13AM +0800, Yu Kuai wrote:
The purpose here is to hide the low level bitmap IO implementation to
the API disk->submit_bio(), and the bitmap IO can be converted to buffer
IO to the bdev_file. This is the easiest way that I can think of to
resue the pagecache, with natural ability for dirty page writeback. I do
think about creating a new anon file and implement a new
file_operations, this will be much more complicated.

I've started looking at this a bit now, sorry for the delay.

As far as I can see you use the bitmap file just so that you have your
own struct address_space and thus page cache instance and then call
read_mapping_page and filemap_write_and_wait_range on it right?
Yes.


For that you'd be much better of just creating your own trivial
file_system_type with an inode fully controlled by your driver
that has a trivial set of address_space ops instead of oddly
mixing with the block layer.

Yes, this is exactly what I said implement a new file_operations(and
address_space ops), I wanted do this the easy way, just reuse the raw
block device ops, this way I just need to implement the submit_bio ops
for new hidden disk.

I can try with new fs type if we really think this solution is too
hacky, however, the code line will be much more. :(


Note that either way I'm not sure using the page cache here is an
all that good idea, as we're at the bottom of the I/O stack and
thus memory allocations can very easily deadlock.

Yes, for the page from bitmap, this set do the easy way just read and
ping all realted pages while loading the bitmap. For two reasons:

1) We don't need to allocate and read pages from IO path;(In the first
RFC version, I'm using a worker to do that).
2) In the first RFC version, I find and get page in the IO path, turns
out page reference is an *atomic*, and the overhead is not acceptable;

And the only action from IO path is that if bitmap page is dirty,
filemap_write_and_wait_range() is called from async worker, the same as
old bitmap, to flush bitmap dirty pages.

What speaks against using your own folios explicitly allocated at
probe time and then just doing manual submit_bio on that? That's
probably not much more code but a lot more robust.

I'm not quite sure if I understand you correctly. Do you means don't use
pagecache for bitmap IO, and manually create BIOs like the old bitmap,
meanwhile invent a new solution for synchronism instead of the global
spin_lock from old bitmap?

Thanks,
Kuai


Also a high level note: the bitmap_operations aren't a very nice
interface. A lot of methods are empty and should just be called
conditionally. Or even better you'd do away with the expensive
indirect calls and just directly call either the old or new
bitmap code.

Meanwhile, bitmap file for the old bitmap will be removed sooner or
later, and this bdev_file implementation will compatible with bitmap
file as well.

Which would also mean that at that point the operations vector would
be pointless, so we might as well not add it to start with.

.