Re: [PATCH RFC v4 1/3] block: add BIO_COMPLETE_IN_TASK for task-context completion
From: Matthew Wilcox
Date: Wed Mar 25 2026 - 16:41:02 EST
On Thu, Mar 26, 2026 at 07:26:26AM +1100, Dave Chinner wrote:
> > @@ -1988,6 +2060,16 @@ static int __init init_bio(void)
> > SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
> > }
> >
> > + for_each_possible_cpu(i) {
> > + struct bio_complete_batch *batch =
> > + per_cpu_ptr(&bio_complete_batch, i);
> > +
> > + bio_list_init(&batch->list);
> > + INIT_WORK(&batch->work, bio_complete_work_fn);
> > + }
> > +
> > + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "block/bio:complete:dead",
> > + NULL, bio_complete_batch_cpu_dead);
>
> XFS inodegc tracks the CPUs with work queued via a cpumask and
> iterates the CPU mask for "all CPU" iteration scans. This avoids the
> need for CPU hotplug integration...
Can you elaborate a bit on how this would work in this context?
I understand why inode garbage collection might do an "all CPU"
iteration, but I don't understand the circumstances under which
we'd iterate over all CPUs to complete deferred BIOs.
> > +++ b/include/linux/blk_types.h
> > @@ -322,6 +322,7 @@ enum {
> > BIO_REMAPPED,
> > BIO_ZONE_WRITE_PLUGGING, /* bio handled through zone write plugging */
> > BIO_EMULATES_ZONE_APPEND, /* bio emulates a zone append operation */
> > + BIO_COMPLETE_IN_TASK, /* complete bi_end_io() in task context */
>
> Can anyone set this on a bio they submit? i.e. This needs a better
> description. Who can use it, constraints, guarantees, etc.
>
> I ask, because the higher filesystem layers often know at submission
> time that we need task based IO completion. If we can tell the bio
> we are submitting that it needs task completion and have the block
> layer guarantee that the ->end_io completion only ever runs in task
> context, then we can get rid of mulitple instances of IO completion
> deferal to task context in filesystem code (e.g. iomap - for both
> buffered and direct IO, xfs buffer cache write completions, etc).
Right, that's the idea, this would be entirely general. I want to do
it for all pagecache writeback so we can change i_pages.xa_lock from
being irq-safe to only taken in task context.