Re: [PATCH] btrfs: keep `priv` struct on stack for sync reads in `btrfs_encoded_read_regular_fill_pages()`

From: Daniel Vacek
Date: Wed Jan 08 2025 - 10:25:00 EST


On Wed, 8 Jan 2025 at 13:42, Johannes Thumshirn
<Johannes.Thumshirn@xxxxxxx> wrote:
>
> On 08.01.25 12:44, Daniel Vacek wrote:
> > Only allocate the `priv` struct from slab for asynchronous mode.
> >
> > There's no need to allocate an object from slab in the synchronous mode. In
> > such a case stack can be happily used as it used to be before 68d3b27e05c7
> > ("btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()")
> > which was a preparation for the async mode.
> >
> > While at it, fix the comment to reflect the atomic => refcount change in
> > d29662695ed7 ("btrfs: fix use-after-free waiting for encoded read endios").
>
>
> Generally I'm not a huge fan of conditional allocation/freeing. It just
> complicates matters. I get it in case of the bio's bi_inline_vecs where
> it's a optimization, but I fail to see why it would make a difference in
> this case.
>
> If we're really going down that route, there should at least be a
> justification other than "no need" to.

Well the main motivation was not to needlessly exercise the slab
allocator when IO uring is not used. It is a bit of an overhead,
though the object is not really big so I guess it's not a big deal
after all (the slab should manage just fine even under low memory
conditions).

68d3b27e05c7 added the allocation for the async mode but also changed
the original behavior of the sync mode which was using stack before.
The async mode indeed requires the allocation as the object's lifetime
extends over the function's one. The sync mode is perfectly contained
within as it always was.

Simply, I tend not to do any allocations which are not strictly
needed. If you prefer to simply allocate the object unconditionally,
we can just drop this patch.

> > struct btrfs_encoded_read_private {
> > - struct completion done;
> > + struct completion *sync_reads;
> > void *uring_ctx;
> > - refcount_t pending_refs;
> > + refcount_t pending_reads;
> > blk_status_t status;
> > };
>
> These renames just make the diff harder to read (and yes I shouldn't
> have renamed pending to pending_refs but that at least also changed the
> type).

I see. But also the completion is changed to a pointer here for a
reason and I tried to make it clear it's only used for sync reads,
hence the rename.

> > - if (refcount_dec_and_test(&priv->pending_refs)) {
> > - int err = blk_status_to_errno(READ_ONCE(priv->status));
> > -
> > + if (refcount_dec_and_test(&priv->pending_reads)) {
> > if (priv->uring_ctx) {
> > + int err = blk_status_to_errno(READ_ONCE(priv->status));
>
> Missing newline after the declaration.

Right. Clearly I did not run checkpatch before sending. Sorry.
I was just trying to match this block to the same one later, which did
not have the newline. (But it also did not have the declaration
before.)

> > unsigned long i = 0;
> > struct btrfs_bio *bbio;
> > - int ret;
>
> That seems unrelated.

It's only used in the async branch. I prefer to have it local to that
branch for readability. Also it matches the same block from the
previous function.

> > @@ -9155,25 +9159,23 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
> > disk_io_size -= bytes;
> > } while (disk_io_size);
> >
> > - refcount_inc(&priv->pending_refs);
> > + refcount_inc(&priv->pending_reads);
> > btrfs_submit_bbio(bbio, 0);
> >
> > if (uring_ctx) {
> > - if (refcount_dec_and_test(&priv->pending_refs)) {
> > - ret = blk_status_to_errno(READ_ONCE(priv->status));
> > - btrfs_uring_read_extent_endio(uring_ctx, ret);
> > + if (refcount_dec_and_test(&priv->pending_reads)) {
> > + int err = blk_status_to_errno(READ_ONCE(priv->status));
> > + btrfs_uring_read_extent_endio(uring_ctx, err);
> > kfree(priv);
>
> Missing newline after the declaration, but still why can't we just keep ret?

The new line was not there before. But back then it was not a
declaration, right?

The rename to `err` just happened as I copied this block from
`btrfs_encoded_read_endio()`. It's a matching code, we could even
factor it out eventually. Again, it's just a bit more descriptive that
the returned value is an error code. It's a matter of preference. I
was just polishing the code a bit while already touching it.

--nX