BCACHE stability patches

From: Denis Bychkov
Date: Tue Dec 22 2015 - 04:13:57 EST


Hi,

There is a set of bcache stability patches elevating bcache stability
to production level. As far as I know, there is no single reported and
peer confirmed bug that is not solved by this set. Unfortunately, for
some reason, Kent does not have enough time and/or energy to review
them and send them upstream. Let's come up with a solution that would
allow to review all these patches (some of them written by Ken
himself, some of them produced by the community), review them and hand
them to the maintainer who is willing to apply them upstream. Without
that, bcache is just another half-assed unstable and buggy cache
layer.
These patches will allow people to start use bcache in production systems.
Please find the patch set attached. (The patches apply cleanly to 4.3
and 4.4 kernel series).

---
From: Zheng Liu <wenqing.lz@xxxxxxxxxx>

In bcache_init() function it forgot to unregister reboot notifier if
bcache fails to unregister a block device. This commit fixes this.

Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx>
Tested-by: Joshua Schmid <jschmid@xxxxxxxx>
---
drivers/md/bcache/super.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2066,8 +2066,10 @@ static int __init bcache_init(void)
closure_debug_init();

bcache_major = register_blkdev(0, "bcache");
- if (bcache_major < 0)
+ if (bcache_major < 0) {
+ unregister_reboot_notifier(&reboot);
return bcache_major;
+ }

if (!(bcache_wq = create_workqueue("bcache")) ||
!(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
From: Zheng Liu <gnehzuil.liu@xxxxxxxxx>
To: linux-bcache@xxxxxxxxxxxxxxx
Cc: Zheng Liu <wenqing.lz@xxxxxxxxxx>, Joshua Schmid
<jschmid@xxxxxxxx>, Zhu Yanhai <zhu.yanhai@xxxxxxxxx>, Kent Overstreet
<kmo@xxxxxxxxxxxxx>
Subject: [PATCH v2] bcache: fix a livelock in btree lock
Date: Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)
From: Zheng Liu <wenqing.lz@xxxxxxxxxx>

This commit tries to fix a livelock in bcache. This livelock might
happen when we causes a huge number of cache misses simultaneously.

When we get a cache miss, bcache will execute the following path.

->cached_dev_make_request()
->cached_dev_read()
->cached_lookup()
->bch->btree_map_keys()
->btree_root() <------------------------
->bch_btree_map_keys_recurse() |
->cache_lookup_fn() |
->cached_dev_cache_miss() |
->bch_btree_insert_check_key() -|
[If btree->seq is not equal to seq + 1, we should return
EINTR and traverse btree again.]

In bch_btree_insert_check_key() function we first need to check upgrade
flag (op->lock == -1), and when this flag is true we need to release
read btree->lock and try to take write btree->lock. During taking and
releasing this write lock, btree->seq will be monotone increased in
order to prevent other threads modify this in cache miss (see btree.h:74).
But if there are some cache misses caused by some requested, we could
meet a livelock because btree->seq is always changed by others. Thus no
one can make progress.

This commit will try to take write btree->lock if it encounters a race
when we traverse btree. Although it sacrifice the scalability but we
can ensure that only one can modify the btree.

Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx>
Tested-by: Joshua Schmid <jschmid@xxxxxxxx>
Cc: Joshua Schmid <jschmid@xxxxxxxx>
Cc: Zhu Yanhai <zhu.yanhai@xxxxxxxxx>
Cc: Kent Overstreet <kmo@xxxxxxxxxxxxx>
---
changelog:
v2: fix a bug that stops all concurrency writes unconditionally.

drivers/md/bcache/btree.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2162,8 +2162,10 @@ int bch_btree_insert_check_key(struct bt
rw_lock(true, b, b->level);

if (b->key.ptr[0] != btree_ptr ||
- b->seq != seq + 1)
+ b->seq != seq + 1) {
+ op->lock = b->level;
goto out;
+ }
}

SET_KEY_PTRS(check_key, 1);

From: Joshua Schmid <jschmid@xxxxxxxx>
Subject: [PATCH] fix a leak in bch_cached_dev_run()
Newsgroups: gmane.linux.kernel.bcache.devel
Date: 2015-02-03 11:24:06 GMT (3 weeks, 2 days, 11 hours and 43 minutes ago)

From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Tested-by: Joshua Schmid <jschmid@xxxxxxxx>
---
drivers/md/bcache/super.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -847,8 +847,11 @@ void bch_cached_dev_run(struct cached_de
buf[SB_LABEL_SIZE] = '\0';
env[2] = kasprintf(GFP_KERNEL, "CACHED_LABEL=%s", buf);

- if (atomic_xchg(&dc->running, 1))
+ if (atomic_xchg(&dc->running, 1)) {
+ kfree(env[1]);
+ kfree(env[2]);
return;
+ }

if (!d->c &&
BDEV_STATE(&dc->sb) != BDEV_STATE_NONE) {
From: Denis Bychkov <manover@xxxxxxxxx>

Allows to use register, not register_quiet in udev to avoid "device_busy" error.
The initial patch proposed at https://lkml.org/lkml/2013/8/26/549 by
Gabriel de Perthuis
<g2p.code@xxxxxxxxx> does not unlock the mutex and hangs the kernel.

See http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594 for
the discussion.
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1936,6 +1936,8 @@ static ssize_t register_bcache(struct ko
else
err = "device busy";
mutex_unlock(&bch_register_lock);
+ if (attr == &ksysfs_register_quiet)
+ goto out;
}
goto err;
}
@@ -1974,8 +1976,7 @@ out:
err_close:
blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
err:
- if (attr != &ksysfs_register_quiet)
- pr_info("error opening %s: %s", path, err);
+ pr_info("error opening %s: %s", path, err);
ret = -EINVAL;
goto out;
}