[RFC PATCH 00/35] ceph, rbd, netfs: Make ceph fully use netfslib

From: David Howells
Date: Thu Mar 13 2025 - 19:34:13 EST


Hi Viacheslav, Alex,

[!] NOTE: This is a preview of a work in progress. rbd works and ceph
works for plain I/O, but content crypto does not.

[!] NOTE: These patches are based on some other sets of patches not
included in this posting. They are, however, included in the git
branch mentioned below.

These patches do a number of things:

(1) (Mostly) collapse the different I/O types (PAGES, PAGELIST, BVECS,
BIO) down to a single one.

I added a new type, ceph_databuf, to make this easier. The page list
is attached to that as a bio_vec[] with an iov_iter, but could also be
some other type supported by the iov_iter. The iov_iter defines the
data or buffer to be used. I have an additional iov_iter type
implemented that allows use of a straight folio[] or page[] instead of
a bio_vec[] that I can deploy if that proves more useful.

(2) RBD is modified to get rid of the removed page-list types and I think
now fully works.

(3) Ceph is mostly converted to using netfslib. At this point, it can do
plain reads and writes, but content crypto in currently
non-functional.

Multipage folios are enabled and work (all the support for that is
hidden inside of netfslib).

(4) The old Ceph VFS/VM I/O API implementation is removed. With that, as
the code currently stands, the patches overall result in a ~2500 LoC
reduction. That may be reduced as some more bits need transferring
from the old code to the new code.

The conversion isn't quite complete:

(1) ceph_osd_linger_request::preply_pages needs switching over to a
ceph_databuf, but I haven't yet managed to work out how the pages that
handle_watch_notify() sticks in there come about.

(2) I haven't altered data transmission in net/ceph/messenger*.c yet. The
aim is to reduce it to a single sendmsg() call for each ceph_msg_data
struct, using the iov_iter therein.

(3) The data reception routines in net/ceph/messenger*.c also need
modifying to pass each ceph_msg_data::iter to recvmsg() in turn.

(4) It might be possible to merge struct ceph_databuf into struct
ceph_msg_data and eliminate the former.

(5) fs/ceph/ still needs a bit more work to clean up the use of page
arrays.

(6) I would like to change front and middle buffers with a ceph_databuf,
vmapping them when we need to access them.

I added a kmap_ceph_databuf_page() macro and used that to get a page and
use kmap_local_page() on it to hide the bvec[] inside to make it easier to
replace.

Anyway, if anyone has any thoughts...


I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=ceph-iter

David

David Howells (35):
ceph: Fix incorrect flush end position calculation
libceph: Rename alignment to offset
libceph: Add a new data container type, ceph_databuf
ceph: Convert ceph_mds_request::r_pagelist to a databuf
libceph: Add functions to add ceph_databufs to requests
rbd: Use ceph_databuf for rbd_obj_read_sync()
libceph: Change ceph_osdc_call()'s reply to a ceph_databuf
libceph: Unexport osd_req_op_cls_request_data_pages()
libceph: Remove osd_req_op_cls_response_data_pages()
libceph: Convert notify_id_pages to a ceph_databuf
ceph: Use ceph_databuf in DIO
libceph: Bypass the messenger-v1 Tx loop for databuf/iter data blobs
rbd: Switch from using bvec_iter to iov_iter
libceph: Remove bvec and bio data container types
libceph: Make osd_req_op_cls_init() use a ceph_databuf and map it
libceph: Convert req_page of ceph_osdc_call() to ceph_databuf
libceph, rbd: Use ceph_databuf encoding start/stop
libceph, rbd: Convert some page arrays to ceph_databuf
libceph, ceph: Convert users of ceph_pagelist to ceph_databuf
libceph: Remove ceph_pagelist
libceph: Make notify code use ceph_databuf_enc_start/stop
libceph, rbd: Convert ceph_osdc_notify() reply to ceph_databuf
rbd: Use ceph_databuf_enc_start/stop()
ceph: Make ceph_calc_file_object_mapping() return size as size_t
ceph: Wrap POSIX_FADV_WILLNEED to get caps
ceph: Kill ceph_rw_context
netfs: Pass extra write context to write functions
netfs: Adjust group handling
netfs: Allow fs-private data to be handed through to request alloc
netfs: Make netfs_page_mkwrite() use folio_mkwrite_check_truncate()
netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int
netfs: Add some more RMW support for ceph
ceph: Use netfslib [INCOMPLETE]
ceph: Enable multipage folios for ceph files
ceph: Remove old I/O API bits

drivers/block/rbd.c | 904 ++++++--------
fs/9p/vfs_file.c | 2 +-
fs/afs/write.c | 2 +-
fs/ceph/Makefile | 2 +-
fs/ceph/acl.c | 39 +-
fs/ceph/addr.c | 2009 +------------------------------
fs/ceph/cache.h | 5 +
fs/ceph/caps.c | 2 +-
fs/ceph/crypto.c | 56 +-
fs/ceph/file.c | 1810 +++-------------------------
fs/ceph/inode.c | 116 +-
fs/ceph/ioctl.c | 2 +-
fs/ceph/locks.c | 23 +-
fs/ceph/mds_client.c | 134 +--
fs/ceph/mds_client.h | 2 +-
fs/ceph/rdwr.c | 1006 ++++++++++++++++
fs/ceph/super.h | 81 +-
fs/ceph/xattr.c | 69 +-
fs/netfs/buffered_read.c | 11 +-
fs/netfs/buffered_write.c | 48 +-
fs/netfs/direct_read.c | 83 +-
fs/netfs/direct_write.c | 3 +-
fs/netfs/internal.h | 40 +-
fs/netfs/main.c | 5 +-
fs/netfs/objects.c | 4 +
fs/netfs/read_collect.c | 2 +
fs/netfs/read_pgpriv2.c | 2 +-
fs/netfs/read_single.c | 2 +-
fs/netfs/write_issue.c | 55 +-
fs/netfs/write_retry.c | 5 +-
fs/smb/client/file.c | 4 +-
include/linux/ceph/databuf.h | 169 +++
include/linux/ceph/decode.h | 4 +-
include/linux/ceph/libceph.h | 3 +-
include/linux/ceph/messenger.h | 122 +-
include/linux/ceph/osd_client.h | 87 +-
include/linux/ceph/pagelist.h | 60 -
include/linux/ceph/striper.h | 60 +-
include/linux/netfs.h | 89 +-
include/trace/events/netfs.h | 3 +
net/ceph/Makefile | 5 +-
net/ceph/cls_lock_client.c | 200 ++-
net/ceph/databuf.c | 200 +++
net/ceph/messenger.c | 310 +----
net/ceph/messenger_v1.c | 76 +-
net/ceph/mon_client.c | 10 +-
net/ceph/osd_client.c | 510 +++-----
net/ceph/pagelist.c | 133 --
net/ceph/snapshot.c | 20 +-
net/ceph/striper.c | 57 +-
50 files changed, 2996 insertions(+), 5650 deletions(-)
create mode 100644 fs/ceph/rdwr.c
create mode 100644 include/linux/ceph/databuf.h
delete mode 100644 include/linux/ceph/pagelist.h
create mode 100644 net/ceph/databuf.c
delete mode 100644 net/ceph/pagelist.c