Re: [PATCH 2/2] blk-plug: don't flush nested plug lists

From: Ming Lei
Date: Tue Apr 07 2015 - 05:19:45 EST


Hi Jeff,

On Tue, Apr 7, 2015 at 3:14 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> The way the on-stack plugging currently works, each nesting level
> flushes its own list of I/Os. This can be less than optimal (read
> awful) for certain workloads. For example, consider an application
> that issues asynchronous O_DIRECT I/Os. It can send down a bunch of
> I/Os together in a single io_submit call, only to have each of them
> dispatched individually down in the bowells of the dirct I/O code.
> The reason is that there are blk_plug's instantiated both at the upper
> call site in do_io_submit and down in do_direct_IO. The latter will
> submit as little as 1 I/O at a time (if you have a small enough I/O
> size) instead of performing the batching that the plugging
> infrastructure is supposed to provide.
>
> Now, for the case where there is an elevator involved, this doesn't
> really matter too much. The elevator will keep the I/O around long
> enough for it to be merged. However, in cases where there is no
> elevator (like blk-mq), I/Os are simply dispatched immediately.
>
> Try this, for example (note I'm using a virtio-blk device, so it's
> using the blk-mq single queue path, though I've also reproduced this
> with the micron p320h):
>
> fio --rw=read --bs=4k --iodepth=128 --iodepth_batch=16 --iodepth_batch_complete=16 --runtime=10s --direct=1 --filename=/dev/vdd --name=job1 --ioengine=libaio --time_based
>
> If you run that on a current kernel, you will get zero merges. Zero!
> After this patch, you will get many merges (the actual number depends
> on how fast your storage is, obviously), and much better throughput.
> Here are results from my test rig:
>
> Unpatched kernel:
> Read B/W: 283,638 KB/s
> Read Merges: 0
>
> Patched kernel:
> Read B/W: 873,224 KB/s
> Read Merges: 2,046K

The data is amazing, but maybe better to provide some latency
data.

>
> I considered several approaches to solving the problem:
> 1) get rid of the inner-most plugs
> 2) handle nesting by using only one on-stack plug
> 2a) #2, except use a per-cpu blk_plug struct, which may clean up the
> code a bit at the expense of memory footprint
>
> Option 1 will be tricky or impossible to do, since inner most plug
> lists are sometimes the only plug lists, depending on the call path.
> Option 2 is what this patch implements. Option 2a is perhaps a better
> idea, but since I already implemented option 2, I figured I'd post it
> for comments and opinions before rewriting it.
>
> Much of the patch involves modifying call sites to blk_start_plug,
> since its signature is changed. The meat of the patch is actually

I am wondering if the type of blk_start_plug has to be changed
since the active plug is always the top plug with your patch, and
blk_finish_plug() can find the active plug from current->plug.

> pretty simple and constrained to block/blk-core.c and
> include/linux/blkdev.h. The only tricky bits were places where plugs
> were finished and then restarted to flush out I/O. There, I went
> ahead and exported blk_flush_plug_list and called that directly.
>
> Comments would be greatly appreciated.
>
> Signed-off-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
> ---
> block/blk-core.c | 33 +++++++++++++++++++--------------
> block/blk-lib.c | 6 +++---
> block/blk-throttle.c | 6 +++---
> drivers/block/xen-blkback/blkback.c | 6 +++---
> drivers/md/dm-bufio.c | 15 +++++++--------
> drivers/md/dm-kcopyd.c | 6 +++---
> drivers/md/dm-thin.c | 6 +++---
> drivers/md/md.c | 6 +++---
> drivers/md/raid1.c | 6 +++---
> drivers/md/raid10.c | 6 +++---
> drivers/md/raid5.c | 12 ++++++------
> drivers/target/target_core_iblock.c | 6 +++---
> fs/aio.c | 6 +++---
> fs/block_dev.c | 6 +++---
> fs/btrfs/scrub.c | 6 +++---
> fs/btrfs/transaction.c | 6 +++---
> fs/btrfs/tree-log.c | 16 ++++++++--------
> fs/btrfs/volumes.c | 12 +++++-------
> fs/buffer.c | 6 +++---
> fs/direct-io.c | 8 +++++---
> fs/ext4/file.c | 6 +++---
> fs/ext4/inode.c | 12 +++++-------
> fs/f2fs/checkpoint.c | 6 +++---
> fs/f2fs/gc.c | 6 +++---
> fs/f2fs/node.c | 6 +++---
> fs/jbd/checkpoint.c | 6 +++---
> fs/jbd/commit.c | 10 +++++-----
> fs/jbd2/checkpoint.c | 6 +++---
> fs/jbd2/commit.c | 6 +++---
> fs/mpage.c | 6 +++---
> fs/xfs/xfs_buf.c | 12 ++++++------
> fs/xfs/xfs_dir2_readdir.c | 6 +++---
> fs/xfs/xfs_itable.c | 6 +++---
> include/linux/blkdev.h | 3 ++-
> mm/madvise.c | 6 +++---
> mm/page-writeback.c | 6 +++---
> mm/readahead.c | 6 +++---
> mm/swap_state.c | 6 +++---
> mm/vmscan.c | 6 +++---
> 39 files changed, 155 insertions(+), 152 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 794c3e7..64f3f2a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3002,7 +3002,7 @@ EXPORT_SYMBOL(kblockd_schedule_delayed_work_on);
>
> /**
> * blk_start_plug - initialize blk_plug and track it inside the task_struct
> - * @plug: The &struct blk_plug that needs to be initialized
> + * @plug: The on-stack &struct blk_plug that needs to be initialized
> *
> * Description:
> * Tracking blk_plug inside the task_struct will help with auto-flushing the
> @@ -3013,26 +3013,29 @@ EXPORT_SYMBOL(kblockd_schedule_delayed_work_on);
> * page belonging to that request that is currently residing in our private
> * plug. By flushing the pending I/O when the process goes to sleep, we avoid
> * this kind of deadlock.
> + *
> + * Returns: a pointer to the active &struct blk_plug
> */
> -void blk_start_plug(struct blk_plug *plug)
> +struct blk_plug *blk_start_plug(struct blk_plug *plug)

The change might not be necessary.

> {
> struct task_struct *tsk = current;
>
> + if (tsk->plug) {
> + tsk->plug->depth++;
> + return tsk->plug;
> + }
> +
> + plug->depth = 1;
> INIT_LIST_HEAD(&plug->list);
> INIT_LIST_HEAD(&plug->mq_list);
> INIT_LIST_HEAD(&plug->cb_list);
>
> /*
> - * If this is a nested plug, don't actually assign it. It will be
> - * flushed on its own.
> + * Store ordering should not be needed here, since a potential
> + * preempt will imply a full memory barrier
> */
> - if (!tsk->plug) {
> - /*
> - * Store ordering should not be needed here, since a potential
> - * preempt will imply a full memory barrier
> - */
> - tsk->plug = plug;
> - }
> + tsk->plug = plug;
> + return tsk->plug;

tsk->plug is always returned from this function, so it means tsk->plug is
always the active plug.

> }
> EXPORT_SYMBOL(blk_start_plug);
>
> @@ -3176,13 +3179,15 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>
> local_irq_restore(flags);
> }
> +EXPORT_SYMBOL_GPL(blk_flush_plug_list);
>
> void blk_finish_plug(struct blk_plug *plug)
> {
> - blk_flush_plug_list(plug, false);
> + if (--plug->depth > 0)
> + return;

The active plug should be current->plug, suppose blk_finish_plug()
is always paired with blk_start_plug().

>
> - if (plug == current->plug)
> - current->plug = NULL;
> + blk_flush_plug_list(plug, false);
> + current->plug = NULL;
> }
> EXPORT_SYMBOL(blk_finish_plug);
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 7688ee3..e2d2448 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -48,7 +48,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> struct bio_batch bb;
> struct bio *bio;
> int ret = 0;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> if (!q)
> return -ENXIO;
> @@ -81,7 +81,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> bb.flags = 1 << BIO_UPTODATE;
> bb.wait = &wait;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while (nr_sects) {
> unsigned int req_sects;
> sector_t end_sect, tmp;
> @@ -128,7 +128,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> */
> cond_resched();
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);

Like the comment above, blk_finish_plug() can figure out the
active plug, so looks unnecessary to introduce return value
to blk_start_plug().

>
> /* Wait for bios in-flight */
> if (!atomic_dec_and_test(&bb.done))
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 5b9c6d5..f57bbd3 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -1266,7 +1266,7 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
> struct request_queue *q = td->queue;
> struct bio_list bio_list_on_stack;
> struct bio *bio;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int rw;
>
> bio_list_init(&bio_list_on_stack);
> @@ -1278,10 +1278,10 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
> spin_unlock_irq(q->queue_lock);
>
> if (!bio_list_empty(&bio_list_on_stack)) {
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while((bio = bio_list_pop(&bio_list_on_stack)))
> generic_make_request(bio);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
> }
>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 2a04d34..f075182 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -1207,7 +1207,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> struct bio **biolist = pending_req->biolist;
> int i, nbio = 0;
> int operation;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> bool drain = false;
> struct grant_page **pages = pending_req->segments;
> unsigned short req_operation;
> @@ -1368,13 +1368,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> }
>
> atomic_set(&pending_req->pendcnt, nbio);
> - blk_start_plug(&plug);
> + blk_start_plug(onstack_plug);
>
> for (i = 0; i < nbio; i++)
> submit_bio(operation, biolist[i]);
>
> /* Let the I/Os go.. */
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> if (operation == READ)
> blkif->st_rd_sect += preq.nr_sects;
> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> index 86dbbc7..a6cbc6f 100644
> --- a/drivers/md/dm-bufio.c
> +++ b/drivers/md/dm-bufio.c
> @@ -706,8 +706,8 @@ static void __write_dirty_buffer(struct dm_buffer *b,
>
> static void __flush_write_list(struct list_head *write_list)
> {
> - struct blk_plug plug;
> - blk_start_plug(&plug);
> + struct blk_plug *plug, onstack_plug;
> + plug = blk_start_plug(&onstack_plug);
> while (!list_empty(write_list)) {
> struct dm_buffer *b =
> list_entry(write_list->next, struct dm_buffer, write_list);
> @@ -715,7 +715,7 @@ static void __flush_write_list(struct list_head *write_list)
> submit_io(b, WRITE, b->block, write_endio);
> dm_bufio_cond_resched();
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> /*
> @@ -1110,13 +1110,13 @@ EXPORT_SYMBOL_GPL(dm_bufio_new);
> void dm_bufio_prefetch(struct dm_bufio_client *c,
> sector_t block, unsigned n_blocks)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> LIST_HEAD(write_list);
>
> BUG_ON(dm_bufio_in_request());
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> dm_bufio_lock(c);
>
> for (; n_blocks--; block++) {
> @@ -1126,9 +1126,8 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
> &write_list);
> if (unlikely(!list_empty(&write_list))) {
> dm_bufio_unlock(c);
> - blk_finish_plug(&plug);
> __flush_write_list(&write_list);
> - blk_start_plug(&plug);
> + blk_flush_plug_list(plug, false);
> dm_bufio_lock(c);
> }
> if (unlikely(b != NULL)) {
> @@ -1149,7 +1148,7 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
> dm_bufio_unlock(c);
>
> flush_plug:
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
> EXPORT_SYMBOL_GPL(dm_bufio_prefetch);
>
> diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
> index 3a7cade..fed371f 100644
> --- a/drivers/md/dm-kcopyd.c
> +++ b/drivers/md/dm-kcopyd.c
> @@ -580,7 +580,7 @@ static void do_work(struct work_struct *work)
> {
> struct dm_kcopyd_client *kc = container_of(work,
> struct dm_kcopyd_client, kcopyd_work);
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> /*
> * The order that these are called is *very* important.
> @@ -589,11 +589,11 @@ static void do_work(struct work_struct *work)
> * list. io jobs call wake when they complete and it all
> * starts again.
> */
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> process_jobs(&kc->complete_jobs, kc, run_complete_job);
> process_jobs(&kc->pages_jobs, kc, run_pages_job);
> process_jobs(&kc->io_jobs, kc, run_io_job);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> /*
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 921aafd..6a7459a 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1775,7 +1775,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)
> unsigned long flags;
> struct bio *bio;
> struct bio_list bios;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> unsigned count = 0;
>
> if (tc->requeue_mode) {
> @@ -1799,7 +1799,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)
>
> spin_unlock_irqrestore(&tc->lock, flags);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while ((bio = bio_list_pop(&bios))) {
> /*
> * If we've got no free new_mapping structs, and processing
> @@ -1824,7 +1824,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)
> dm_pool_issue_prefetches(pool->pmd);
> }
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> static int cmp_cells(const void *lhs, const void *rhs)
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 717daad..9f24719 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7402,7 +7402,7 @@ void md_do_sync(struct md_thread *thread)
> int skipped = 0;
> struct md_rdev *rdev;
> char *desc, *action = NULL;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> /* just incase thread restarts... */
> if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
> @@ -7572,7 +7572,7 @@ void md_do_sync(struct md_thread *thread)
> md_new_event(mddev);
> update_time = jiffies;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while (j < max_sectors) {
> sector_t sectors;
>
> @@ -7686,7 +7686,7 @@ void md_do_sync(struct md_thread *thread)
> /*
> * this also signals 'finished resyncing' to md_stop
> */
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
>
> /* tell personality that we are finished */
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d34e238..2608bc3 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2399,11 +2399,11 @@ static void raid1d(struct md_thread *thread)
> unsigned long flags;
> struct r1conf *conf = mddev->private;
> struct list_head *head = &conf->retry_list;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> md_check_recovery(mddev);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (;;) {
>
> flush_pending_writes(conf);
> @@ -2441,7 +2441,7 @@ static void raid1d(struct md_thread *thread)
> if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
> md_check_recovery(mddev);
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> static int init_resync(struct r1conf *conf)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a7196c4..7bea5c7 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2791,11 +2791,11 @@ static void raid10d(struct md_thread *thread)
> unsigned long flags;
> struct r10conf *conf = mddev->private;
> struct list_head *head = &conf->retry_list;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> md_check_recovery(mddev);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (;;) {
>
> flush_pending_writes(conf);
> @@ -2835,7 +2835,7 @@ static void raid10d(struct md_thread *thread)
> if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
> md_check_recovery(mddev);
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> static int init_resync(struct r10conf *conf)
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index cd2f96b..59e7090 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5259,11 +5259,11 @@ static void raid5_do_work(struct work_struct *work)
> struct r5conf *conf = group->conf;
> int group_id = group - conf->worker_groups;
> int handled;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> pr_debug("+++ raid5worker active\n");
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> handled = 0;
> spin_lock_irq(&conf->device_lock);
> while (1) {
> @@ -5281,7 +5281,7 @@ static void raid5_do_work(struct work_struct *work)
> pr_debug("%d stripes handled\n", handled);
>
> spin_unlock_irq(&conf->device_lock);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> pr_debug("--- raid5worker inactive\n");
> }
> @@ -5298,13 +5298,13 @@ static void raid5d(struct md_thread *thread)
> struct mddev *mddev = thread->mddev;
> struct r5conf *conf = mddev->private;
> int handled;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> pr_debug("+++ raid5d active\n");
>
> md_check_recovery(mddev);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> handled = 0;
> spin_lock_irq(&conf->device_lock);
> while (1) {
> @@ -5352,7 +5352,7 @@ static void raid5d(struct md_thread *thread)
> spin_unlock_irq(&conf->device_lock);
>
> async_tx_issue_pending_all();
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> pr_debug("--- raid5d inactive\n");
> }
> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index d4a4b0f..248a6bb 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -361,13 +361,13 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
>
> static void iblock_submit_bios(struct bio_list *list, int rw)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> struct bio *bio;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while ((bio = bio_list_pop(list)))
> submit_bio(rw, bio);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> static void iblock_end_io_flush(struct bio *bio, int err)
> diff --git a/fs/aio.c b/fs/aio.c
> index f8e52a1..b1c3583 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1575,7 +1575,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> struct kioctx *ctx;
> long ret = 0;
> int i = 0;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> if (unlikely(nr < 0))
> return -EINVAL;
> @@ -1592,7 +1592,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> return -EINVAL;
> }
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> /*
> * AKPM: should this return a partial result if some of the IOs were
> @@ -1616,7 +1616,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> if (ret)
> break;
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> percpu_ref_put(&ctx->users);
> return i ? i : ret;
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 975266b..928d3e0 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1598,10 +1598,10 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> struct file *file = iocb->ki_filp;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> ssize_t ret;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> ret = __generic_file_write_iter(iocb, from);
> if (ret > 0) {
> ssize_t err;
> @@ -1609,7 +1609,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> if (err < 0)
> ret = err;
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> return ret;
> }
> EXPORT_SYMBOL_GPL(blkdev_write_iter);
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ec57687..768bac3 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -2967,7 +2967,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
> struct btrfs_root *root = fs_info->extent_root;
> struct btrfs_root *csum_root = fs_info->csum_root;
> struct btrfs_extent_item *extent;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> u64 flags;
> int ret;
> int slot;
> @@ -3088,7 +3088,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
> * collect all data csums for the stripe to avoid seeking during
> * the scrub. This might currently (crc32) end up to be about 1MB
> */
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> /*
> * now find all extents for each stripe and scrub them
> @@ -3316,7 +3316,7 @@ out:
> scrub_wr_submit(sctx);
> mutex_unlock(&sctx->wr_ctx.wr_lock);
>
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> btrfs_free_path(path);
> btrfs_free_path(ppath);
> return ret < 0 ? ret : 0;
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 8be4278..b5a8078 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -979,11 +979,11 @@ static int btrfs_write_and_wait_marked_extents(struct btrfs_root *root,
> {
> int ret;
> int ret2;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> ret = btrfs_write_marked_extents(root, dirty_pages, mark);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> ret2 = btrfs_wait_marked_extents(root, dirty_pages, mark);
>
> if (ret)
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index c5b8ba3..1623924 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2519,7 +2519,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> struct btrfs_root *log_root_tree = root->fs_info->log_root_tree;
> int log_transid = 0;
> struct btrfs_log_ctx root_log_ctx;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> mutex_lock(&root->log_mutex);
> log_transid = ctx->log_transid;
> @@ -2571,10 +2571,10 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> /* we start IO on all the marked extents here, but we don't actually
> * wait for them until later.
> */
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> ret = btrfs_write_marked_extents(log, &log->dirty_log_pages, mark);
> if (ret) {
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> btrfs_abort_transaction(trans, root, ret);
> btrfs_free_logged_extents(log, log_transid);
> btrfs_set_log_full_commit(root->fs_info, trans);
> @@ -2619,7 +2619,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> if (!list_empty(&root_log_ctx.list))
> list_del_init(&root_log_ctx.list);
>
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> btrfs_set_log_full_commit(root->fs_info, trans);
>
> if (ret != -ENOSPC) {
> @@ -2635,7 +2635,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> }
>
> if (log_root_tree->log_transid_committed >= root_log_ctx.log_transid) {
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> mutex_unlock(&log_root_tree->log_mutex);
> ret = root_log_ctx.log_ret;
> goto out;
> @@ -2643,7 +2643,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
>
> index2 = root_log_ctx.log_transid % 2;
> if (atomic_read(&log_root_tree->log_commit[index2])) {
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> ret = btrfs_wait_marked_extents(log, &log->dirty_log_pages,
> mark);
> btrfs_wait_logged_extents(trans, log, log_transid);
> @@ -2669,7 +2669,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> * check the full commit flag again
> */
> if (btrfs_need_log_full_commit(root->fs_info, trans)) {
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> btrfs_wait_marked_extents(log, &log->dirty_log_pages, mark);
> btrfs_free_logged_extents(log, log_transid);
> mutex_unlock(&log_root_tree->log_mutex);
> @@ -2680,7 +2680,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> ret = btrfs_write_marked_extents(log_root_tree,
> &log_root_tree->dirty_log_pages,
> EXTENT_DIRTY | EXTENT_NEW);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> if (ret) {
> btrfs_set_log_full_commit(root->fs_info, trans);
> btrfs_abort_transaction(trans, root, ret);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8222f6f..0e215ff 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -261,7 +261,7 @@ static noinline void run_scheduled_bios(struct btrfs_device *device)
> unsigned long last_waited = 0;
> int force_reg = 0;
> int sync_pending = 0;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> /*
> * this function runs all the bios we've collected for
> @@ -269,7 +269,7 @@ static noinline void run_scheduled_bios(struct btrfs_device *device)
> * another device without first sending all of these down.
> * So, setup a plug here and finish it off before we return
> */
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> bdi = blk_get_backing_dev_info(device->bdev);
> fs_info = device->dev_root->fs_info;
> @@ -358,8 +358,7 @@ loop_lock:
> if (pending_bios == &device->pending_sync_bios) {
> sync_pending = 1;
> } else if (sync_pending) {
> - blk_finish_plug(&plug);
> - blk_start_plug(&plug);
> + blk_flush_plug_list(plug, false);
> sync_pending = 0;
> }
>
> @@ -415,8 +414,7 @@ loop_lock:
> }
> /* unplug every 64 requests just for good measure */
> if (batch_run % 64 == 0) {
> - blk_finish_plug(&plug);
> - blk_start_plug(&plug);
> + blk_flush_plug_list(plug, false);
> sync_pending = 0;
> }
> }
> @@ -431,7 +429,7 @@ loop_lock:
> spin_unlock(&device->io_lock);
>
> done:
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> static void pending_bios_fn(struct btrfs_work *work)
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 20805db..727d642 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -717,10 +717,10 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
> struct list_head tmp;
> struct address_space *mapping;
> int err = 0, err2;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> INIT_LIST_HEAD(&tmp);
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> spin_lock(lock);
> while (!list_empty(list)) {
> @@ -758,7 +758,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
> }
>
> spin_unlock(lock);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> spin_lock(lock);
>
> while (!list_empty(&tmp)) {
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index e181b6b..e79c0c6 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -488,6 +488,8 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
> static void dio_await_completion(struct dio *dio)
> {
> struct bio *bio;
> +
> + blk_flush_plug(current);
> do {
> bio = dio_await_one(dio);
> if (bio)
> @@ -1108,7 +1110,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
> struct dio *dio;
> struct dio_submit sdio = { 0, };
> struct buffer_head map_bh = { 0, };
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> unsigned long align = offset | iov_iter_alignment(iter);
>
> if (rw & WRITE)
> @@ -1231,7 +1233,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
>
> sdio.pages_in_io += iov_iter_npages(iter, INT_MAX);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> retval = do_direct_IO(dio, &sdio, &map_bh);
> if (retval)
> @@ -1262,7 +1264,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
> if (sdio.bio)
> dio_bio_submit(dio, &sdio);
>
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> /*
> * It is possible that, we return short IO due to end of file.
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 33a09da..fd0cf21 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -94,7 +94,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct file *file = iocb->ki_filp;
> struct inode *inode = file_inode(iocb->ki_filp);
> struct mutex *aio_mutex = NULL;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int o_direct = io_is_direct(file);
> int overwrite = 0;
> size_t length = iov_iter_count(from);
> @@ -139,7 +139,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>
> iocb->private = &overwrite;
> if (o_direct) {
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
>
> /* check whether we do a DIO overwrite or not */
> @@ -183,7 +183,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ret = err;
> }
> if (o_direct)
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> errout:
> if (aio_mutex)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5cb9a21..d4b645b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2284,7 +2284,7 @@ static int ext4_writepages(struct address_space *mapping,
> int needed_blocks, rsv_blocks = 0, ret = 0;
> struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
> bool done;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> bool give_up_on_write = false;
>
> trace_ext4_writepages(inode, wbc);
> @@ -2298,11 +2298,9 @@ static int ext4_writepages(struct address_space *mapping,
> goto out_writepages;
>
> if (ext4_should_journal_data(inode)) {
> - struct blk_plug plug;
> -
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> ret = write_cache_pages(mapping, wbc, __writepage, mapping);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> goto out_writepages;
> }
>
> @@ -2368,7 +2366,7 @@ retry:
> if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
> done = false;
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while (!done && mpd.first_page <= mpd.last_page) {
> /* For each extent of pages we use new io_end */
> mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
> @@ -2438,7 +2436,7 @@ retry:
> if (ret)
> break;
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> if (!ret && !cycled && wbc->nr_to_write > 0) {
> cycled = 1;
> mpd.last_page = writeback_index - 1;
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 7f794b7..de2b522 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -810,10 +810,10 @@ static int block_operations(struct f2fs_sb_info *sbi)
> .nr_to_write = LONG_MAX,
> .for_reclaim = 0,
> };
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int err = 0;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> retry_flush_dents:
> f2fs_lock_all(sbi);
> @@ -846,7 +846,7 @@ retry_flush_nodes:
> goto retry_flush_nodes;
> }
> out:
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> return err;
> }
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 76adbc3..082e961 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -661,12 +661,12 @@ static void do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno,
> {
> struct page *sum_page;
> struct f2fs_summary_block *sum;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> /* read segment summary of victim */
> sum_page = get_sum_page(sbi, segno);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> sum = page_address(sum_page);
>
> @@ -678,7 +678,7 @@ static void do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno,
> gc_data_segment(sbi, sum->entries, gc_list, segno, gc_type);
> break;
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> stat_inc_seg_count(sbi, GET_SUM_TYPE((&sum->footer)));
> stat_inc_call_count(sbi->stat_info);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 97bd9d3..e1fc81e 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1066,7 +1066,7 @@ repeat:
> struct page *get_node_page_ra(struct page *parent, int start)
> {
> struct f2fs_sb_info *sbi = F2FS_P_SB(parent);
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> struct page *page;
> int err, i, end;
> nid_t nid;
> @@ -1086,7 +1086,7 @@ repeat:
> else if (err == LOCKED_PAGE)
> goto page_hit;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> /* Then, try readahead for siblings of the desired node */
> end = start + MAX_RA_NODE;
> @@ -1098,7 +1098,7 @@ repeat:
> ra_node_page(sbi, nid);
> }
>
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> lock_page(page);
> if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
> diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
> index 08c0304..29b550d 100644
> --- a/fs/jbd/checkpoint.c
> +++ b/fs/jbd/checkpoint.c
> @@ -258,12 +258,12 @@ static void
> __flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
> {
> int i;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (i = 0; i < *batch_count; i++)
> write_dirty_buffer(bhs[i], WRITE_SYNC);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> for (i = 0; i < *batch_count; i++) {
> struct buffer_head *bh = bhs[i];
> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
> index bb217dc..ccb32f0 100644
> --- a/fs/jbd/commit.c
> +++ b/fs/jbd/commit.c
> @@ -311,7 +311,7 @@ void journal_commit_transaction(journal_t *journal)
> int first_tag = 0;
> int tag_flag;
> int i;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int write_op = WRITE;
>
> /*
> @@ -444,10 +444,10 @@ void journal_commit_transaction(journal_t *journal)
> * Now start flushing things to disk, in the order they appear
> * on the transaction lists. Data blocks go first.
> */
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> err = journal_submit_data_buffers(journal, commit_transaction,
> write_op);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> /*
> * Wait for all previously submitted IO to complete.
> @@ -503,7 +503,7 @@ void journal_commit_transaction(journal_t *journal)
> err = 0;
> }
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> journal_write_revoke_records(journal, commit_transaction, write_op);
>
> @@ -697,7 +697,7 @@ start_journal_io:
> }
> }
>
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> /* Lo and behold: we have just managed to send a transaction to
> the log. Before we can commit it, wait for the IO so far to
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 988b32e..31b5e73 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -182,12 +182,12 @@ static void
> __flush_batch(journal_t *journal, int *batch_count)
> {
> int i;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (i = 0; i < *batch_count; i++)
> write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE_SYNC);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> for (i = 0; i < *batch_count; i++) {
> struct buffer_head *bh = journal->j_chkpt_bhs[i];
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index b73e021..713d26a 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -390,7 +390,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> int tag_bytes = journal_tag_bytes(journal);
> struct buffer_head *cbh = NULL; /* For transactional checksums */
> __u32 crc32_sum = ~0;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> /* Tail of the journal */
> unsigned long first_block;
> tid_t first_tid;
> @@ -555,7 +555,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> if (err)
> jbd2_journal_abort(journal, err);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> jbd2_journal_write_revoke_records(journal, commit_transaction,
> &log_bufs, WRITE_SYNC);
>
> @@ -805,7 +805,7 @@ start_journal_io:
> __jbd2_journal_abort_hard(journal);
> }
>
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> /* Lo and behold: we have just managed to send a transaction to
> the log. Before we can commit it, wait for the IO so far to
> diff --git a/fs/mpage.c b/fs/mpage.c
> index 3e79220..926dc42 100644
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -676,10 +676,10 @@ int
> mpage_writepages(struct address_space *mapping,
> struct writeback_control *wbc, get_block_t get_block)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int ret;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> if (!get_block)
> ret = generic_writepages(mapping, wbc);
> @@ -695,7 +695,7 @@ mpage_writepages(struct address_space *mapping,
> if (mpd.bio)
> mpage_bio_submit(WRITE, mpd.bio);
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> return ret;
> }
> EXPORT_SYMBOL(mpage_writepages);
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 1790b00..776ac5a 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1208,7 +1208,7 @@ STATIC void
> _xfs_buf_ioapply(
> struct xfs_buf *bp)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int rw;
> int offset;
> int size;
> @@ -1281,7 +1281,7 @@ _xfs_buf_ioapply(
> */
> offset = bp->b_offset;
> size = BBTOB(bp->b_io_length);
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (i = 0; i < bp->b_map_count; i++) {
> xfs_buf_ioapply_map(bp, i, &offset, &size, rw);
> if (bp->b_error)
> @@ -1289,7 +1289,7 @@ _xfs_buf_ioapply(
> if (size <= 0)
> break; /* all done */
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> /*
> @@ -1772,7 +1772,7 @@ __xfs_buf_delwri_submit(
> struct list_head *io_list,
> bool wait)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> struct xfs_buf *bp, *n;
> int pinned = 0;
>
> @@ -1806,7 +1806,7 @@ __xfs_buf_delwri_submit(
>
> list_sort(NULL, io_list, xfs_buf_cmp);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> list_for_each_entry_safe(bp, n, io_list, b_list) {
> bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL);
> bp->b_flags |= XBF_WRITE | XBF_ASYNC;
> @@ -1823,7 +1823,7 @@ __xfs_buf_delwri_submit(
>
> xfs_buf_submit(bp);
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> return pinned;
> }
> diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> index 098cd78..6074671 100644
> --- a/fs/xfs/xfs_dir2_readdir.c
> +++ b/fs/xfs/xfs_dir2_readdir.c
> @@ -275,7 +275,7 @@ xfs_dir2_leaf_readbuf(
> struct xfs_inode *dp = args->dp;
> struct xfs_buf *bp = *bpp;
> struct xfs_bmbt_irec *map = mip->map;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int error = 0;
> int length;
> int i;
> @@ -404,7 +404,7 @@ xfs_dir2_leaf_readbuf(
> /*
> * Do we need more readahead?
> */
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (mip->ra_index = mip->ra_offset = i = 0;
> mip->ra_want > mip->ra_current && i < mip->map_blocks;
> i += geo->fsbcount) {
> @@ -455,7 +455,7 @@ xfs_dir2_leaf_readbuf(
> }
> }
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> out:
> *bpp = bp;
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 82e3142..b890aa4 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -179,7 +179,7 @@ xfs_bulkstat_ichunk_ra(
> struct xfs_inobt_rec_incore *irec)
> {
> xfs_agblock_t agbno;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int blks_per_cluster;
> int inodes_per_cluster;
> int i; /* inode chunk index */
> @@ -188,7 +188,7 @@ xfs_bulkstat_ichunk_ra(
> blks_per_cluster = xfs_icluster_size_fsb(mp);
> inodes_per_cluster = blks_per_cluster << mp->m_sb.sb_inopblog;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (i = 0; i < XFS_INODES_PER_CHUNK;
> i += inodes_per_cluster, agbno += blks_per_cluster) {
> if (xfs_inobt_maskn(i, inodes_per_cluster) & ~irec->ir_free) {
> @@ -196,7 +196,7 @@ xfs_bulkstat_ichunk_ra(
> &xfs_inode_buf_ops);
> }
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> }
>
> /*
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7f9a516..4617fce 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1091,6 +1091,7 @@ static inline void blk_post_runtime_resume(struct request_queue *q, int err) {}
> * schedule() where blk_schedule_flush_plug() is called.
> */
> struct blk_plug {
> + int depth; /* number of nested plugs */
> struct list_head list; /* requests */
> struct list_head mq_list; /* blk-mq requests */
> struct list_head cb_list; /* md requires an unplug callback */
> @@ -1106,7 +1107,7 @@ struct blk_plug_cb {
> };
> extern struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug,
> void *data, int size);
> -extern void blk_start_plug(struct blk_plug *);
> +extern struct blk_plug *blk_start_plug(struct blk_plug *);
> extern void blk_finish_plug(struct blk_plug *);
> extern void blk_flush_plug_list(struct blk_plug *, bool);
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index d551475..1f7d5ad 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -463,7 +463,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> int error = -EINVAL;
> int write;
> size_t len;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> #ifdef CONFIG_MEMORY_FAILURE
> if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
> @@ -503,7 +503,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> if (vma && start > vma->vm_start)
> prev = vma;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (;;) {
> /* Still start < end. */
> error = -ENOMEM;
> @@ -539,7 +539,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> vma = find_vma(current->mm, start);
> }
> out:
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> if (write)
> up_write(&current->mm->mmap_sem);
> else
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 644bcb6..9369d5e 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2011,16 +2011,16 @@ static int __writepage(struct page *page, struct writeback_control *wbc,
> int generic_writepages(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> int ret;
>
> /* deal with chardevs and other special file */
> if (!mapping->a_ops->writepage)
> return 0;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> ret = write_cache_pages(mapping, wbc, __writepage, mapping);
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> return ret;
> }
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 9356758..c3350b5 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -111,11 +111,11 @@ EXPORT_SYMBOL(read_cache_pages);
> static int read_pages(struct address_space *mapping, struct file *filp,
> struct list_head *pages, unsigned nr_pages)
> {
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> unsigned page_idx;
> int ret;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
>
> if (mapping->a_ops->readpages) {
> ret = mapping->a_ops->readpages(filp, mapping, pages, nr_pages);
> @@ -136,7 +136,7 @@ static int read_pages(struct address_space *mapping, struct file *filp,
> ret = 0;
>
> out:
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> return ret;
> }
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 405923f..3f6b8ec 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -455,7 +455,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> unsigned long offset = entry_offset;
> unsigned long start_offset, end_offset;
> unsigned long mask;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
>
> mask = swapin_nr_pages(offset) - 1;
> if (!mask)
> @@ -467,7 +467,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> if (!start_offset) /* First page is swap header. */
> start_offset++;
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> for (offset = start_offset; offset <= end_offset ; offset++) {
> /* Ok, do the async read-ahead now */
> page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
> @@ -478,7 +478,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
> SetPageReadahead(page);
> page_cache_release(page);
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
>
> lru_add_drain(); /* Push any new pages onto the LRU now */
> skip:
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5e8eadd..fd29974 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2130,7 +2130,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int swappiness,
> enum lru_list lru;
> unsigned long nr_reclaimed = 0;
> unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> - struct blk_plug plug;
> + struct blk_plug *plug, onstack_plug;
> bool scan_adjusted;
>
> get_scan_count(lruvec, swappiness, sc, nr, lru_pages);
> @@ -2152,7 +2152,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int swappiness,
> scan_adjusted = (global_reclaim(sc) && !current_is_kswapd() &&
> sc->priority == DEF_PRIORITY);
>
> - blk_start_plug(&plug);
> + plug = blk_start_plug(&onstack_plug);
> while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
> nr[LRU_INACTIVE_FILE]) {
> unsigned long nr_anon, nr_file, percentage;
> @@ -2222,7 +2222,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int swappiness,
>
> scan_adjusted = true;
> }
> - blk_finish_plug(&plug);
> + blk_finish_plug(plug);
> sc->nr_reclaimed += nr_reclaimed;
>
> /*
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/



--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/