[PATCH 2/2] blk-plug: don't flush nested plug lists

From: Jeff Moyer
Date: Mon Apr 06 2015 - 15:15:30 EST


The way the on-stack plugging currently works, each nesting level
flushes its own list of I/Os. This can be less than optimal (read
awful) for certain workloads. For example, consider an application
that issues asynchronous O_DIRECT I/Os. It can send down a bunch of
I/Os together in a single io_submit call, only to have each of them
dispatched individually down in the bowells of the dirct I/O code.
The reason is that there are blk_plug's instantiated both at the upper
call site in do_io_submit and down in do_direct_IO. The latter will
submit as little as 1 I/O at a time (if you have a small enough I/O
size) instead of performing the batching that the plugging
infrastructure is supposed to provide.

Now, for the case where there is an elevator involved, this doesn't
really matter too much. The elevator will keep the I/O around long
enough for it to be merged. However, in cases where there is no
elevator (like blk-mq), I/Os are simply dispatched immediately.

Try this, for example (note I'm using a virtio-blk device, so it's
using the blk-mq single queue path, though I've also reproduced this
with the micron p320h):

fio --rw=read --bs=4k --iodepth=128 --iodepth_batch=16 --iodepth_batch_complete=16 --runtime=10s --direct=1 --filename=/dev/vdd --name=job1 --ioengine=libaio --time_based

If you run that on a current kernel, you will get zero merges. Zero!
After this patch, you will get many merges (the actual number depends
on how fast your storage is, obviously), and much better throughput.
Here are results from my test rig:

Unpatched kernel:
Read B/W: 283,638 KB/s
Read Merges: 0

Patched kernel:
Read B/W: 873,224 KB/s
Read Merges: 2,046K

I considered several approaches to solving the problem:
1) get rid of the inner-most plugs
2) handle nesting by using only one on-stack plug
2a) #2, except use a per-cpu blk_plug struct, which may clean up the
code a bit at the expense of memory footprint

Option 1 will be tricky or impossible to do, since inner most plug
lists are sometimes the only plug lists, depending on the call path.
Option 2 is what this patch implements. Option 2a is perhaps a better
idea, but since I already implemented option 2, I figured I'd post it
for comments and opinions before rewriting it.

Much of the patch involves modifying call sites to blk_start_plug,
since its signature is changed. The meat of the patch is actually
pretty simple and constrained to block/blk-core.c and
include/linux/blkdev.h. The only tricky bits were places where plugs
were finished and then restarted to flush out I/O. There, I went
ahead and exported blk_flush_plug_list and called that directly.

Comments would be greatly appreciated.

Signed-off-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
---
block/blk-core.c | 33 +++++++++++++++++++--------------
block/blk-lib.c | 6 +++---
block/blk-throttle.c | 6 +++---
drivers/block/xen-blkback/blkback.c | 6 +++---
drivers/md/dm-bufio.c | 15 +++++++--------
drivers/md/dm-kcopyd.c | 6 +++---
drivers/md/dm-thin.c | 6 +++---
drivers/md/md.c | 6 +++---
drivers/md/raid1.c | 6 +++---
drivers/md/raid10.c | 6 +++---
drivers/md/raid5.c | 12 ++++++------
drivers/target/target_core_iblock.c | 6 +++---
fs/aio.c | 6 +++---
fs/block_dev.c | 6 +++---
fs/btrfs/scrub.c | 6 +++---
fs/btrfs/transaction.c | 6 +++---
fs/btrfs/tree-log.c | 16 ++++++++--------
fs/btrfs/volumes.c | 12 +++++-------
fs/buffer.c | 6 +++---
fs/direct-io.c | 8 +++++---
fs/ext4/file.c | 6 +++---
fs/ext4/inode.c | 12 +++++-------
fs/f2fs/checkpoint.c | 6 +++---
fs/f2fs/gc.c | 6 +++---
fs/f2fs/node.c | 6 +++---
fs/jbd/checkpoint.c | 6 +++---
fs/jbd/commit.c | 10 +++++-----
fs/jbd2/checkpoint.c | 6 +++---
fs/jbd2/commit.c | 6 +++---
fs/mpage.c | 6 +++---
fs/xfs/xfs_buf.c | 12 ++++++------
fs/xfs/xfs_dir2_readdir.c | 6 +++---
fs/xfs/xfs_itable.c | 6 +++---
include/linux/blkdev.h | 3 ++-
mm/madvise.c | 6 +++---
mm/page-writeback.c | 6 +++---
mm/readahead.c | 6 +++---
mm/swap_state.c | 6 +++---
mm/vmscan.c | 6 +++---
39 files changed, 155 insertions(+), 152 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 794c3e7..64f3f2a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3002,7 +3002,7 @@ EXPORT_SYMBOL(kblockd_schedule_delayed_work_on);

/**
* blk_start_plug - initialize blk_plug and track it inside the task_struct
- * @plug: The &struct blk_plug that needs to be initialized
+ * @plug: The on-stack &struct blk_plug that needs to be initialized
*
* Description:
* Tracking blk_plug inside the task_struct will help with auto-flushing the
@@ -3013,26 +3013,29 @@ EXPORT_SYMBOL(kblockd_schedule_delayed_work_on);
* page belonging to that request that is currently residing in our private
* plug. By flushing the pending I/O when the process goes to sleep, we avoid
* this kind of deadlock.
+ *
+ * Returns: a pointer to the active &struct blk_plug
*/
-void blk_start_plug(struct blk_plug *plug)
+struct blk_plug *blk_start_plug(struct blk_plug *plug)
{
struct task_struct *tsk = current;

+ if (tsk->plug) {
+ tsk->plug->depth++;
+ return tsk->plug;
+ }
+
+ plug->depth = 1;
INIT_LIST_HEAD(&plug->list);
INIT_LIST_HEAD(&plug->mq_list);
INIT_LIST_HEAD(&plug->cb_list);

/*
- * If this is a nested plug, don't actually assign it. It will be
- * flushed on its own.
+ * Store ordering should not be needed here, since a potential
+ * preempt will imply a full memory barrier
*/
- if (!tsk->plug) {
- /*
- * Store ordering should not be needed here, since a potential
- * preempt will imply a full memory barrier
- */
- tsk->plug = plug;
- }
+ tsk->plug = plug;
+ return tsk->plug;
}
EXPORT_SYMBOL(blk_start_plug);

@@ -3176,13 +3179,15 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)

local_irq_restore(flags);
}
+EXPORT_SYMBOL_GPL(blk_flush_plug_list);

void blk_finish_plug(struct blk_plug *plug)
{
- blk_flush_plug_list(plug, false);
+ if (--plug->depth > 0)
+ return;

- if (plug == current->plug)
- current->plug = NULL;
+ blk_flush_plug_list(plug, false);
+ current->plug = NULL;
}
EXPORT_SYMBOL(blk_finish_plug);

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..e2d2448 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -48,7 +48,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct bio_batch bb;
struct bio *bio;
int ret = 0;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

if (!q)
return -ENXIO;
@@ -81,7 +81,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
bb.flags = 1 << BIO_UPTODATE;
bb.wait = &wait;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while (nr_sects) {
unsigned int req_sects;
sector_t end_sect, tmp;
@@ -128,7 +128,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
*/
cond_resched();
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

/* Wait for bios in-flight */
if (!atomic_dec_and_test(&bb.done))
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5b9c6d5..f57bbd3 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1266,7 +1266,7 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
struct request_queue *q = td->queue;
struct bio_list bio_list_on_stack;
struct bio *bio;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int rw;

bio_list_init(&bio_list_on_stack);
@@ -1278,10 +1278,10 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
spin_unlock_irq(q->queue_lock);

if (!bio_list_empty(&bio_list_on_stack)) {
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while((bio = bio_list_pop(&bio_list_on_stack)))
generic_make_request(bio);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}
}

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 2a04d34..f075182 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1207,7 +1207,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
struct bio **biolist = pending_req->biolist;
int i, nbio = 0;
int operation;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
bool drain = false;
struct grant_page **pages = pending_req->segments;
unsigned short req_operation;
@@ -1368,13 +1368,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
}

atomic_set(&pending_req->pendcnt, nbio);
- blk_start_plug(&plug);
+ blk_start_plug(onstack_plug);

for (i = 0; i < nbio; i++)
submit_bio(operation, biolist[i]);

/* Let the I/Os go.. */
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

if (operation == READ)
blkif->st_rd_sect += preq.nr_sects;
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 86dbbc7..a6cbc6f 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -706,8 +706,8 @@ static void __write_dirty_buffer(struct dm_buffer *b,

static void __flush_write_list(struct list_head *write_list)
{
- struct blk_plug plug;
- blk_start_plug(&plug);
+ struct blk_plug *plug, onstack_plug;
+ plug = blk_start_plug(&onstack_plug);
while (!list_empty(write_list)) {
struct dm_buffer *b =
list_entry(write_list->next, struct dm_buffer, write_list);
@@ -715,7 +715,7 @@ static void __flush_write_list(struct list_head *write_list)
submit_io(b, WRITE, b->block, write_endio);
dm_bufio_cond_resched();
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

/*
@@ -1110,13 +1110,13 @@ EXPORT_SYMBOL_GPL(dm_bufio_new);
void dm_bufio_prefetch(struct dm_bufio_client *c,
sector_t block, unsigned n_blocks)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

LIST_HEAD(write_list);

BUG_ON(dm_bufio_in_request());

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
dm_bufio_lock(c);

for (; n_blocks--; block++) {
@@ -1126,9 +1126,8 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
&write_list);
if (unlikely(!list_empty(&write_list))) {
dm_bufio_unlock(c);
- blk_finish_plug(&plug);
__flush_write_list(&write_list);
- blk_start_plug(&plug);
+ blk_flush_plug_list(plug, false);
dm_bufio_lock(c);
}
if (unlikely(b != NULL)) {
@@ -1149,7 +1148,7 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
dm_bufio_unlock(c);

flush_plug:
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}
EXPORT_SYMBOL_GPL(dm_bufio_prefetch);

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 3a7cade..fed371f 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -580,7 +580,7 @@ static void do_work(struct work_struct *work)
{
struct dm_kcopyd_client *kc = container_of(work,
struct dm_kcopyd_client, kcopyd_work);
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

/*
* The order that these are called is *very* important.
@@ -589,11 +589,11 @@ static void do_work(struct work_struct *work)
* list. io jobs call wake when they complete and it all
* starts again.
*/
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
process_jobs(&kc->complete_jobs, kc, run_complete_job);
process_jobs(&kc->pages_jobs, kc, run_pages_job);
process_jobs(&kc->io_jobs, kc, run_io_job);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

/*
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 921aafd..6a7459a 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1775,7 +1775,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)
unsigned long flags;
struct bio *bio;
struct bio_list bios;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
unsigned count = 0;

if (tc->requeue_mode) {
@@ -1799,7 +1799,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)

spin_unlock_irqrestore(&tc->lock, flags);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while ((bio = bio_list_pop(&bios))) {
/*
* If we've got no free new_mapping structs, and processing
@@ -1824,7 +1824,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)
dm_pool_issue_prefetches(pool->pmd);
}
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

static int cmp_cells(const void *lhs, const void *rhs)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 717daad..9f24719 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7402,7 +7402,7 @@ void md_do_sync(struct md_thread *thread)
int skipped = 0;
struct md_rdev *rdev;
char *desc, *action = NULL;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

/* just incase thread restarts... */
if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
@@ -7572,7 +7572,7 @@ void md_do_sync(struct md_thread *thread)
md_new_event(mddev);
update_time = jiffies;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while (j < max_sectors) {
sector_t sectors;

@@ -7686,7 +7686,7 @@ void md_do_sync(struct md_thread *thread)
/*
* this also signals 'finished resyncing' to md_stop
*/
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));

/* tell personality that we are finished */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d34e238..2608bc3 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2399,11 +2399,11 @@ static void raid1d(struct md_thread *thread)
unsigned long flags;
struct r1conf *conf = mddev->private;
struct list_head *head = &conf->retry_list;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

md_check_recovery(mddev);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (;;) {

flush_pending_writes(conf);
@@ -2441,7 +2441,7 @@ static void raid1d(struct md_thread *thread)
if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
md_check_recovery(mddev);
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

static int init_resync(struct r1conf *conf)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a7196c4..7bea5c7 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2791,11 +2791,11 @@ static void raid10d(struct md_thread *thread)
unsigned long flags;
struct r10conf *conf = mddev->private;
struct list_head *head = &conf->retry_list;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

md_check_recovery(mddev);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (;;) {

flush_pending_writes(conf);
@@ -2835,7 +2835,7 @@ static void raid10d(struct md_thread *thread)
if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
md_check_recovery(mddev);
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

static int init_resync(struct r10conf *conf)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cd2f96b..59e7090 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5259,11 +5259,11 @@ static void raid5_do_work(struct work_struct *work)
struct r5conf *conf = group->conf;
int group_id = group - conf->worker_groups;
int handled;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

pr_debug("+++ raid5worker active\n");

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
handled = 0;
spin_lock_irq(&conf->device_lock);
while (1) {
@@ -5281,7 +5281,7 @@ static void raid5_do_work(struct work_struct *work)
pr_debug("%d stripes handled\n", handled);

spin_unlock_irq(&conf->device_lock);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

pr_debug("--- raid5worker inactive\n");
}
@@ -5298,13 +5298,13 @@ static void raid5d(struct md_thread *thread)
struct mddev *mddev = thread->mddev;
struct r5conf *conf = mddev->private;
int handled;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

pr_debug("+++ raid5d active\n");

md_check_recovery(mddev);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
handled = 0;
spin_lock_irq(&conf->device_lock);
while (1) {
@@ -5352,7 +5352,7 @@ static void raid5d(struct md_thread *thread)
spin_unlock_irq(&conf->device_lock);

async_tx_issue_pending_all();
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

pr_debug("--- raid5d inactive\n");
}
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index d4a4b0f..248a6bb 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -361,13 +361,13 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)

static void iblock_submit_bios(struct bio_list *list, int rw)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
struct bio *bio;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while ((bio = bio_list_pop(list)))
submit_bio(rw, bio);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

static void iblock_end_io_flush(struct bio *bio, int err)
diff --git a/fs/aio.c b/fs/aio.c
index f8e52a1..b1c3583 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1575,7 +1575,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
struct kioctx *ctx;
long ret = 0;
int i = 0;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

if (unlikely(nr < 0))
return -EINVAL;
@@ -1592,7 +1592,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
return -EINVAL;
}

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

/*
* AKPM: should this return a partial result if some of the IOs were
@@ -1616,7 +1616,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
if (ret)
break;
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

percpu_ref_put(&ctx->users);
return i ? i : ret;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 975266b..928d3e0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1598,10 +1598,10 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
ssize_t ret;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
ret = __generic_file_write_iter(iocb, from);
if (ret > 0) {
ssize_t err;
@@ -1609,7 +1609,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (err < 0)
ret = err;
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
return ret;
}
EXPORT_SYMBOL_GPL(blkdev_write_iter);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ec57687..768bac3 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2967,7 +2967,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
struct btrfs_root *root = fs_info->extent_root;
struct btrfs_root *csum_root = fs_info->csum_root;
struct btrfs_extent_item *extent;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
u64 flags;
int ret;
int slot;
@@ -3088,7 +3088,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
* collect all data csums for the stripe to avoid seeking during
* the scrub. This might currently (crc32) end up to be about 1MB
*/
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

/*
* now find all extents for each stripe and scrub them
@@ -3316,7 +3316,7 @@ out:
scrub_wr_submit(sctx);
mutex_unlock(&sctx->wr_ctx.wr_lock);

- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
btrfs_free_path(path);
btrfs_free_path(ppath);
return ret < 0 ? ret : 0;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8be4278..b5a8078 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -979,11 +979,11 @@ static int btrfs_write_and_wait_marked_extents(struct btrfs_root *root,
{
int ret;
int ret2;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
ret = btrfs_write_marked_extents(root, dirty_pages, mark);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
ret2 = btrfs_wait_marked_extents(root, dirty_pages, mark);

if (ret)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c5b8ba3..1623924 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2519,7 +2519,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
struct btrfs_root *log_root_tree = root->fs_info->log_root_tree;
int log_transid = 0;
struct btrfs_log_ctx root_log_ctx;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

mutex_lock(&root->log_mutex);
log_transid = ctx->log_transid;
@@ -2571,10 +2571,10 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
/* we start IO on all the marked extents here, but we don't actually
* wait for them until later.
*/
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
ret = btrfs_write_marked_extents(log, &log->dirty_log_pages, mark);
if (ret) {
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
btrfs_abort_transaction(trans, root, ret);
btrfs_free_logged_extents(log, log_transid);
btrfs_set_log_full_commit(root->fs_info, trans);
@@ -2619,7 +2619,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
if (!list_empty(&root_log_ctx.list))
list_del_init(&root_log_ctx.list);

- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
btrfs_set_log_full_commit(root->fs_info, trans);

if (ret != -ENOSPC) {
@@ -2635,7 +2635,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
}

if (log_root_tree->log_transid_committed >= root_log_ctx.log_transid) {
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
mutex_unlock(&log_root_tree->log_mutex);
ret = root_log_ctx.log_ret;
goto out;
@@ -2643,7 +2643,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,

index2 = root_log_ctx.log_transid % 2;
if (atomic_read(&log_root_tree->log_commit[index2])) {
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
ret = btrfs_wait_marked_extents(log, &log->dirty_log_pages,
mark);
btrfs_wait_logged_extents(trans, log, log_transid);
@@ -2669,7 +2669,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
* check the full commit flag again
*/
if (btrfs_need_log_full_commit(root->fs_info, trans)) {
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
btrfs_wait_marked_extents(log, &log->dirty_log_pages, mark);
btrfs_free_logged_extents(log, log_transid);
mutex_unlock(&log_root_tree->log_mutex);
@@ -2680,7 +2680,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
ret = btrfs_write_marked_extents(log_root_tree,
&log_root_tree->dirty_log_pages,
EXTENT_DIRTY | EXTENT_NEW);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
if (ret) {
btrfs_set_log_full_commit(root->fs_info, trans);
btrfs_abort_transaction(trans, root, ret);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8222f6f..0e215ff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -261,7 +261,7 @@ static noinline void run_scheduled_bios(struct btrfs_device *device)
unsigned long last_waited = 0;
int force_reg = 0;
int sync_pending = 0;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

/*
* this function runs all the bios we've collected for
@@ -269,7 +269,7 @@ static noinline void run_scheduled_bios(struct btrfs_device *device)
* another device without first sending all of these down.
* So, setup a plug here and finish it off before we return
*/
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

bdi = blk_get_backing_dev_info(device->bdev);
fs_info = device->dev_root->fs_info;
@@ -358,8 +358,7 @@ loop_lock:
if (pending_bios == &device->pending_sync_bios) {
sync_pending = 1;
} else if (sync_pending) {
- blk_finish_plug(&plug);
- blk_start_plug(&plug);
+ blk_flush_plug_list(plug, false);
sync_pending = 0;
}

@@ -415,8 +414,7 @@ loop_lock:
}
/* unplug every 64 requests just for good measure */
if (batch_run % 64 == 0) {
- blk_finish_plug(&plug);
- blk_start_plug(&plug);
+ blk_flush_plug_list(plug, false);
sync_pending = 0;
}
}
@@ -431,7 +429,7 @@ loop_lock:
spin_unlock(&device->io_lock);

done:
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

static void pending_bios_fn(struct btrfs_work *work)
diff --git a/fs/buffer.c b/fs/buffer.c
index 20805db..727d642 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -717,10 +717,10 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
struct list_head tmp;
struct address_space *mapping;
int err = 0, err2;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

INIT_LIST_HEAD(&tmp);
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

spin_lock(lock);
while (!list_empty(list)) {
@@ -758,7 +758,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
}

spin_unlock(lock);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
spin_lock(lock);

while (!list_empty(&tmp)) {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..e79c0c6 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -488,6 +488,8 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
static void dio_await_completion(struct dio *dio)
{
struct bio *bio;
+
+ blk_flush_plug(current);
do {
bio = dio_await_one(dio);
if (bio)
@@ -1108,7 +1110,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
struct dio *dio;
struct dio_submit sdio = { 0, };
struct buffer_head map_bh = { 0, };
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
unsigned long align = offset | iov_iter_alignment(iter);

if (rw & WRITE)
@@ -1231,7 +1233,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,

sdio.pages_in_io += iov_iter_npages(iter, INT_MAX);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

retval = do_direct_IO(dio, &sdio, &map_bh);
if (retval)
@@ -1262,7 +1264,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
if (sdio.bio)
dio_bio_submit(dio, &sdio);

- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

/*
* It is possible that, we return short IO due to end of file.
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 33a09da..fd0cf21 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -94,7 +94,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(iocb->ki_filp);
struct mutex *aio_mutex = NULL;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int o_direct = io_is_direct(file);
int overwrite = 0;
size_t length = iov_iter_count(from);
@@ -139,7 +139,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)

iocb->private = &overwrite;
if (o_direct) {
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);


/* check whether we do a DIO overwrite or not */
@@ -183,7 +183,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
ret = err;
}
if (o_direct)
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

errout:
if (aio_mutex)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5cb9a21..d4b645b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2284,7 +2284,7 @@ static int ext4_writepages(struct address_space *mapping,
int needed_blocks, rsv_blocks = 0, ret = 0;
struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
bool done;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
bool give_up_on_write = false;

trace_ext4_writepages(inode, wbc);
@@ -2298,11 +2298,9 @@ static int ext4_writepages(struct address_space *mapping,
goto out_writepages;

if (ext4_should_journal_data(inode)) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
ret = write_cache_pages(mapping, wbc, __writepage, mapping);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
goto out_writepages;
}

@@ -2368,7 +2366,7 @@ retry:
if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
done = false;
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while (!done && mpd.first_page <= mpd.last_page) {
/* For each extent of pages we use new io_end */
mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
@@ -2438,7 +2436,7 @@ retry:
if (ret)
break;
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
if (!ret && !cycled && wbc->nr_to_write > 0) {
cycled = 1;
mpd.last_page = writeback_index - 1;
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7f794b7..de2b522 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -810,10 +810,10 @@ static int block_operations(struct f2fs_sb_info *sbi)
.nr_to_write = LONG_MAX,
.for_reclaim = 0,
};
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int err = 0;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

retry_flush_dents:
f2fs_lock_all(sbi);
@@ -846,7 +846,7 @@ retry_flush_nodes:
goto retry_flush_nodes;
}
out:
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
return err;
}

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 76adbc3..082e961 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -661,12 +661,12 @@ static void do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno,
{
struct page *sum_page;
struct f2fs_summary_block *sum;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

/* read segment summary of victim */
sum_page = get_sum_page(sbi, segno);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

sum = page_address(sum_page);

@@ -678,7 +678,7 @@ static void do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno,
gc_data_segment(sbi, sum->entries, gc_list, segno, gc_type);
break;
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

stat_inc_seg_count(sbi, GET_SUM_TYPE((&sum->footer)));
stat_inc_call_count(sbi->stat_info);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 97bd9d3..e1fc81e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1066,7 +1066,7 @@ repeat:
struct page *get_node_page_ra(struct page *parent, int start)
{
struct f2fs_sb_info *sbi = F2FS_P_SB(parent);
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
struct page *page;
int err, i, end;
nid_t nid;
@@ -1086,7 +1086,7 @@ repeat:
else if (err == LOCKED_PAGE)
goto page_hit;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

/* Then, try readahead for siblings of the desired node */
end = start + MAX_RA_NODE;
@@ -1098,7 +1098,7 @@ repeat:
ra_node_page(sbi, nid);
}

- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

lock_page(page);
if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
index 08c0304..29b550d 100644
--- a/fs/jbd/checkpoint.c
+++ b/fs/jbd/checkpoint.c
@@ -258,12 +258,12 @@ static void
__flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
{
int i;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (i = 0; i < *batch_count; i++)
write_dirty_buffer(bhs[i], WRITE_SYNC);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

for (i = 0; i < *batch_count; i++) {
struct buffer_head *bh = bhs[i];
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index bb217dc..ccb32f0 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -311,7 +311,7 @@ void journal_commit_transaction(journal_t *journal)
int first_tag = 0;
int tag_flag;
int i;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int write_op = WRITE;

/*
@@ -444,10 +444,10 @@ void journal_commit_transaction(journal_t *journal)
* Now start flushing things to disk, in the order they appear
* on the transaction lists. Data blocks go first.
*/
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
err = journal_submit_data_buffers(journal, commit_transaction,
write_op);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

/*
* Wait for all previously submitted IO to complete.
@@ -503,7 +503,7 @@ void journal_commit_transaction(journal_t *journal)
err = 0;
}

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

journal_write_revoke_records(journal, commit_transaction, write_op);

@@ -697,7 +697,7 @@ start_journal_io:
}
}

- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

/* Lo and behold: we have just managed to send a transaction to
the log. Before we can commit it, wait for the IO so far to
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 988b32e..31b5e73 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -182,12 +182,12 @@ static void
__flush_batch(journal_t *journal, int *batch_count)
{
int i;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (i = 0; i < *batch_count; i++)
write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE_SYNC);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

for (i = 0; i < *batch_count; i++) {
struct buffer_head *bh = journal->j_chkpt_bhs[i];
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index b73e021..713d26a 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -390,7 +390,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
int tag_bytes = journal_tag_bytes(journal);
struct buffer_head *cbh = NULL; /* For transactional checksums */
__u32 crc32_sum = ~0;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
/* Tail of the journal */
unsigned long first_block;
tid_t first_tid;
@@ -555,7 +555,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (err)
jbd2_journal_abort(journal, err);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
jbd2_journal_write_revoke_records(journal, commit_transaction,
&log_bufs, WRITE_SYNC);

@@ -805,7 +805,7 @@ start_journal_io:
__jbd2_journal_abort_hard(journal);
}

- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

/* Lo and behold: we have just managed to send a transaction to
the log. Before we can commit it, wait for the IO so far to
diff --git a/fs/mpage.c b/fs/mpage.c
index 3e79220..926dc42 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -676,10 +676,10 @@ int
mpage_writepages(struct address_space *mapping,
struct writeback_control *wbc, get_block_t get_block)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int ret;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

if (!get_block)
ret = generic_writepages(mapping, wbc);
@@ -695,7 +695,7 @@ mpage_writepages(struct address_space *mapping,
if (mpd.bio)
mpage_bio_submit(WRITE, mpd.bio);
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
return ret;
}
EXPORT_SYMBOL(mpage_writepages);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1790b00..776ac5a 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1208,7 +1208,7 @@ STATIC void
_xfs_buf_ioapply(
struct xfs_buf *bp)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int rw;
int offset;
int size;
@@ -1281,7 +1281,7 @@ _xfs_buf_ioapply(
*/
offset = bp->b_offset;
size = BBTOB(bp->b_io_length);
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (i = 0; i < bp->b_map_count; i++) {
xfs_buf_ioapply_map(bp, i, &offset, &size, rw);
if (bp->b_error)
@@ -1289,7 +1289,7 @@ _xfs_buf_ioapply(
if (size <= 0)
break; /* all done */
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

/*
@@ -1772,7 +1772,7 @@ __xfs_buf_delwri_submit(
struct list_head *io_list,
bool wait)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
struct xfs_buf *bp, *n;
int pinned = 0;

@@ -1806,7 +1806,7 @@ __xfs_buf_delwri_submit(

list_sort(NULL, io_list, xfs_buf_cmp);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
list_for_each_entry_safe(bp, n, io_list, b_list) {
bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL);
bp->b_flags |= XBF_WRITE | XBF_ASYNC;
@@ -1823,7 +1823,7 @@ __xfs_buf_delwri_submit(

xfs_buf_submit(bp);
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

return pinned;
}
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index 098cd78..6074671 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -275,7 +275,7 @@ xfs_dir2_leaf_readbuf(
struct xfs_inode *dp = args->dp;
struct xfs_buf *bp = *bpp;
struct xfs_bmbt_irec *map = mip->map;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int error = 0;
int length;
int i;
@@ -404,7 +404,7 @@ xfs_dir2_leaf_readbuf(
/*
* Do we need more readahead?
*/
- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (mip->ra_index = mip->ra_offset = i = 0;
mip->ra_want > mip->ra_current && i < mip->map_blocks;
i += geo->fsbcount) {
@@ -455,7 +455,7 @@ xfs_dir2_leaf_readbuf(
}
}
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

out:
*bpp = bp;
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 82e3142..b890aa4 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -179,7 +179,7 @@ xfs_bulkstat_ichunk_ra(
struct xfs_inobt_rec_incore *irec)
{
xfs_agblock_t agbno;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int blks_per_cluster;
int inodes_per_cluster;
int i; /* inode chunk index */
@@ -188,7 +188,7 @@ xfs_bulkstat_ichunk_ra(
blks_per_cluster = xfs_icluster_size_fsb(mp);
inodes_per_cluster = blks_per_cluster << mp->m_sb.sb_inopblog;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (i = 0; i < XFS_INODES_PER_CHUNK;
i += inodes_per_cluster, agbno += blks_per_cluster) {
if (xfs_inobt_maskn(i, inodes_per_cluster) & ~irec->ir_free) {
@@ -196,7 +196,7 @@ xfs_bulkstat_ichunk_ra(
&xfs_inode_buf_ops);
}
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
}

/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7f9a516..4617fce 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1091,6 +1091,7 @@ static inline void blk_post_runtime_resume(struct request_queue *q, int err) {}
* schedule() where blk_schedule_flush_plug() is called.
*/
struct blk_plug {
+ int depth; /* number of nested plugs */
struct list_head list; /* requests */
struct list_head mq_list; /* blk-mq requests */
struct list_head cb_list; /* md requires an unplug callback */
@@ -1106,7 +1107,7 @@ struct blk_plug_cb {
};
extern struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug,
void *data, int size);
-extern void blk_start_plug(struct blk_plug *);
+extern struct blk_plug *blk_start_plug(struct blk_plug *);
extern void blk_finish_plug(struct blk_plug *);
extern void blk_flush_plug_list(struct blk_plug *, bool);

diff --git a/mm/madvise.c b/mm/madvise.c
index d551475..1f7d5ad 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -463,7 +463,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
int error = -EINVAL;
int write;
size_t len;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

#ifdef CONFIG_MEMORY_FAILURE
if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
@@ -503,7 +503,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
if (vma && start > vma->vm_start)
prev = vma;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (;;) {
/* Still start < end. */
error = -ENOMEM;
@@ -539,7 +539,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
vma = find_vma(current->mm, start);
}
out:
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
if (write)
up_write(&current->mm->mmap_sem);
else
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 644bcb6..9369d5e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2011,16 +2011,16 @@ static int __writepage(struct page *page, struct writeback_control *wbc,
int generic_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
int ret;

/* deal with chardevs and other special file */
if (!mapping->a_ops->writepage)
return 0;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
ret = write_cache_pages(mapping, wbc, __writepage, mapping);
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
return ret;
}

diff --git a/mm/readahead.c b/mm/readahead.c
index 9356758..c3350b5 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -111,11 +111,11 @@ EXPORT_SYMBOL(read_cache_pages);
static int read_pages(struct address_space *mapping, struct file *filp,
struct list_head *pages, unsigned nr_pages)
{
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
unsigned page_idx;
int ret;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);

if (mapping->a_ops->readpages) {
ret = mapping->a_ops->readpages(filp, mapping, pages, nr_pages);
@@ -136,7 +136,7 @@ static int read_pages(struct address_space *mapping, struct file *filp,
ret = 0;

out:
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

return ret;
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 405923f..3f6b8ec 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -455,7 +455,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;

mask = swapin_nr_pages(offset) - 1;
if (!mask)
@@ -467,7 +467,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
if (!start_offset) /* First page is swap header. */
start_offset++;

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
/* Ok, do the async read-ahead now */
page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
@@ -478,7 +478,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
SetPageReadahead(page);
page_cache_release(page);
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);

lru_add_drain(); /* Push any new pages onto the LRU now */
skip:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd..fd29974 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2130,7 +2130,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int swappiness,
enum lru_list lru;
unsigned long nr_reclaimed = 0;
unsigned long nr_to_reclaim = sc->nr_to_reclaim;
- struct blk_plug plug;
+ struct blk_plug *plug, onstack_plug;
bool scan_adjusted;

get_scan_count(lruvec, swappiness, sc, nr, lru_pages);
@@ -2152,7 +2152,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int swappiness,
scan_adjusted = (global_reclaim(sc) && !current_is_kswapd() &&
sc->priority == DEF_PRIORITY);

- blk_start_plug(&plug);
+ plug = blk_start_plug(&onstack_plug);
while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
nr[LRU_INACTIVE_FILE]) {
unsigned long nr_anon, nr_file, percentage;
@@ -2222,7 +2222,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int swappiness,

scan_adjusted = true;
}
- blk_finish_plug(&plug);
+ blk_finish_plug(plug);
sc->nr_reclaimed += nr_reclaimed;

/*
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/