[RFC][PATCH 1/2] fs: use RCU for free_super() vs. __sb_start_write()

From: Dave Hansen
Date: Fri Jun 19 2015 - 18:32:38 EST



Currently, __sb_start_write() and freeze_super() can race with
each other. __sb_start_write() uses a smp_mb() to ensure that
freeze_super() can see its write to sb->s_writers.counter and
that it can see freeze_super()'s update to sb->s_writers.frozen.
This all seems to work fine.

But, this smp_mb() makes __sb_start_write() the single hottest
function in the kernel if I sit in a loop and do tiny write()s to
tmpfs over and over. This is on a very small 2-core system, so
it will only get worse on larger systems.

This _seems_ like an ideal case for RCU. __sb_start_write() is
the RCU read-side and is in a very fast, performance-sensitive
path. freeze_super() is the RCU writer and is in an extremely
rare non-performance-sensitive path.

Instead of doing and smp_wmb() in __sb_start_write(), we do
rcu_read_lock(). This ensures that a CPU doing freeze_super()
can not proceed past its synchronize_rcu() until the grace
period has ended and the 's_writers.frozen = SB_FREEZE_WRITE'
is visible to __sb_start_write().

One question here: Does the work that __sb_start_write() does in
a previous grace period becomes visible to freeze_super() after
its call to synchronize_rcu()? It _seems_ like it should, but it
seems backwards to me since __sb_start_write() is the "reader" in
this case.

This patch increases the number of writes/second that I can do
by 10.4%.

Does anybody see any holes with this?

Cc: Jan Kara <jack@xxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>

---

b/fs/super.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)

diff -puN fs/super.c~rcu-__sb_start_write fs/super.c
--- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700
+++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700
@@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s
*/
int __sb_start_write(struct super_block *sb, int level, bool wait)
{
-retry:
- if (unlikely(sb->s_writers.frozen >= level)) {
+ /*
+ * RCU keeps freeze_super() from proceeding
+ * while we are in here.
+ */
+ rcu_read_lock();
+ while (unlikely(sb->s_writers.frozen >= level)) {
+ rcu_read_unlock();
if (!wait)
- return 0;
+ return 0;
wait_event(sb->s_writers.wait_unfrozen,
sb->s_writers.frozen < level);
+ rcu_read_lock();
}

#ifdef CONFIG_LOCKDEP
acquire_freeze_lock(sb, level, !wait, _RET_IP_);
#endif
percpu_counter_inc(&sb->s_writers.counter[level-1]);
- /*
- * Make sure counter is updated before we check for frozen.
- * freeze_super() first sets frozen and then checks the counter.
- */
- smp_mb();
- if (unlikely(sb->s_writers.frozen >= level)) {
- __sb_end_write(sb, level);
- goto retry;
- }
+ rcu_read_unlock();
return 1;
}
EXPORT_SYMBOL(__sb_start_write);
@@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb)

/* From now on, no new normal writers can start */
sb->s_writers.frozen = SB_FREEZE_WRITE;
- smp_wmb();
+ /*
+ * After we synchronize_rcu(), we have ensured that everyone
+ * who reads sb->s_writers.frozen under rcu_read_lock() can
+ * now see our update. This pretty much means that
+ * __sb_start_write() will not allow any new writers.
+ */
+ synchronize_rcu();

/* Release s_umount to preserve sb_start_write -> s_umount ordering */
up_write(&sb->s_umount);
@@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb)
/* Now we go and block page faults... */
down_write(&sb->s_umount);
sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
- smp_wmb();
+ synchronize_rcu();

sb_wait_write(sb, SB_FREEZE_PAGEFAULT);

@@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb)

/* Now wait for internal filesystem counter */
sb->s_writers.frozen = SB_FREEZE_FS;
- smp_wmb();
+ synchronize_rcu();
sb_wait_write(sb, SB_FREEZE_FS);

if (sb->s_op->freeze_fs) {
@@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem freeze failed\n");
sb->s_writers.frozen = SB_UNFROZEN;
- smp_wmb();
+ synchronize_rcu();
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
return ret;
@@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb)

out:
sb->s_writers.frozen = SB_UNFROZEN;
- smp_wmb();
+ synchronize_rcu();
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);

_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/