On Mon, Jan 10, 2022 at 8:15 PM Waiman Long <longman@xxxxxxxxxx> wrote:
That is not how rwsem works. A reader which fails to get the lockThanks for the correction, that makes sense. I haven't spent too much
because it is write-locked will remove its reader count before going to
sleep. So the reader count will be zero eventually. Of course, there is
a short period of time where the reader count will be non-zero until the
reader removes its own reader count. So if a new writer comes in at that
time, it will fail its initial trylock and probably go to optimistic
spinning mode. If the writer that owns the lock release it at the right
moment, the reader may acquire the read lock.
time on rwsem internals and I'm not confident about when flags are set
and cleared in sem->count; is there a case where sem->count after
up_write() could be nonzero?
An example from one trace:
1. Low-priority userspace thread 4764 is blocked in f2fs_unlink,
probably at f2fs_lock_op, which is a wrapper around
down_read(cp_rwsem).
2. f2fs-ckpt runs at t=0ms and wakes thread 4764, making it runnable.
3. At t=1ms, f2fs-ckpt enters uninterruptible sleep and blocks at
rwsem_down_write_slowpath per sched_blocked_reason.
4. At t=26ms, thread 4764 runs for the first time since being made
runnable. Within 40us, thread 4764 unblocks f2fs-ckpt and makes it
runnable.
Since thread 4764 is awakened by f2fs-ckpt but never runs before it
unblocks f2fs-ckpt in down_write_slowpath(), the only idea I had is
that cp_rwsem->count is nonzero after f2fs-ckpt's up_write() in step 2
(maybe because of rwsem_mark_wake()?).
I do have a question about the number of readers in such a case comparedJust to be 100% clear, it's not a single 9.7s stall, it's many smaller
with the number of writers. Are there a large number of low priority
hanging around? What is an average read lock hold time?
Blocking for 9.7s for a write lock is quite excessive and we need to
figure out how this happen.,
stalls of 10-500+ms in f2fs-ckpt that add up to 9.7s over that range.
f2fs is not my area of expertise, but my understanding is that
cp_rwsem in f2fs has many (potentially unbounded) readers and a single
writer. Arbitrary userspace work (fsync, creating/deleting/truncating
files, atomic writes) may grab the read lock, but assuming the
merge_checkpoint option is enabled, only f2fs-ckpt will ever grab the
write lock during normal operation. However, in this particular
example, it looks like there may have been 5-10 threads blocked on
f2fs-ckpt that were awakened alongside thread 4764 in step 2.
I'll defer to the f2fs experts on the average duration that the read
lock is held.