Re: [PATCH] f2fs: move f2fs to use reader-unfair rwsems
From: Tim Murray
Date: Tue Jan 11 2022 - 01:53:31 EST
On Mon, Jan 10, 2022 at 8:15 PM Waiman Long <longman@xxxxxxxxxx> wrote:
> That is not how rwsem works. A reader which fails to get the lock
> because it is write-locked will remove its reader count before going to
> sleep. So the reader count will be zero eventually. Of course, there is
> a short period of time where the reader count will be non-zero until the
> reader removes its own reader count. So if a new writer comes in at that
> time, it will fail its initial trylock and probably go to optimistic
> spinning mode. If the writer that owns the lock release it at the right
> moment, the reader may acquire the read lock.
Thanks for the correction, that makes sense. I haven't spent too much
time on rwsem internals and I'm not confident about when flags are set
and cleared in sem->count; is there a case where sem->count after
up_write() could be nonzero?
An example from one trace:
1. Low-priority userspace thread 4764 is blocked in f2fs_unlink,
probably at f2fs_lock_op, which is a wrapper around
down_read(cp_rwsem).
2. f2fs-ckpt runs at t=0ms and wakes thread 4764, making it runnable.
3. At t=1ms, f2fs-ckpt enters uninterruptible sleep and blocks at
rwsem_down_write_slowpath per sched_blocked_reason.
4. At t=26ms, thread 4764 runs for the first time since being made
runnable. Within 40us, thread 4764 unblocks f2fs-ckpt and makes it
runnable.
Since thread 4764 is awakened by f2fs-ckpt but never runs before it
unblocks f2fs-ckpt in down_write_slowpath(), the only idea I had is
that cp_rwsem->count is nonzero after f2fs-ckpt's up_write() in step 2
(maybe because of rwsem_mark_wake()?).
> I do have a question about the number of readers in such a case compared
> with the number of writers. Are there a large number of low priority
> hanging around? What is an average read lock hold time?
>
> Blocking for 9.7s for a write lock is quite excessive and we need to
> figure out how this happen.,
Just to be 100% clear, it's not a single 9.7s stall, it's many smaller
stalls of 10-500+ms in f2fs-ckpt that add up to 9.7s over that range.
f2fs is not my area of expertise, but my understanding is that
cp_rwsem in f2fs has many (potentially unbounded) readers and a single
writer. Arbitrary userspace work (fsync, creating/deleting/truncating
files, atomic writes) may grab the read lock, but assuming the
merge_checkpoint option is enabled, only f2fs-ckpt will ever grab the
write lock during normal operation. However, in this particular
example, it looks like there may have been 5-10 threads blocked on
f2fs-ckpt that were awakened alongside thread 4764 in step 2.
I'll defer to the f2fs experts on the average duration that the read
lock is held.