On Mon, Feb 13, 2023 at 12:14:59PM -0500, Waiman Long wrote:
How ?!? since all the stealing and spinning is out, the wait-list reignsI am once again confused...Yes, that is true.
*WHY* are you changing the writer wake-up path? The comments added here
don't clarify anything.
If we set handoff, we terminate/disallow the spinning/stealing. The
direct consequence is that the slowpath/wait-list becomes the only way
forward.
Since we don't take wait_lock on up, we fundamentally have a raceThere is an advantage in doing the handover earlier, if possible. A reader
condition. But *WHY* do you insist on handling that in rwsem_wake()?
Delaying all that until rwsem_try_write_lock()? Doing so would render
pretty much all of the above pointless, no?
that comes in between can spoils the takeover of the rwsem in
supreme. A new reader will queue.
Are you worried about the spurious elevation of ->count due to that
uncondition increment on down_read() ? This is always going to be the
case.
rwsem_try_write_lock() and cause it to sleep again. Since we will have toThere is a complexity cost -- and so far I've not seen a single line of
take the wait lock anyway in rwsem_wake(), there isn't much additional cost
to do some additional check.
justification for the added complexity.
Note that the kernel test robot had detected a 19.3% improvement ofSeen that; but who's saying a simpler implementation will not also have
will-it-scale.per_thread_ops [1] due to this commit. That indicates this
commit is good to have. I am planning to update the commit log to include
that information as well as additional reasoning as discussed here.
those gains. And if not, then we have a clear separation into
functionality and optimization and justifications for it.
But now, we have neither. I'm not saying the patch is wrong -- I'm
saying it is needlessly complicated without justification.
All this stuff is hard enough -- we should really try to keep is as
simple as possible, having two distinct ways to wake up a writer is not
'as simple as possible'.
But what if it is rwsem_down_read_slowpath() that drops ->count to 0 andAfter all, rwsem_mark_wake() already wakes the writer if it is first,As I said before, waking up the writer does not mean it can always get the
no? Why invent yet another special way to wake up the writer.
rwsem on the first rwsem_try_write_lock(). Doing early handoff in
rwsem_wake() can remove that ambiguity.
does rwsem_mark_wake() and misses your fancy new path?
My bad, I sometimes miss stating assumptions and rationale that can properly justify the patch. I usually need more back and fro like what we are doing now to fully write out all the missing pieces. So I really appreciate your time in reviewing the patch and your critiques :-)
That is, I'm thinking there's an argument to be had that rwsem_wake() is
fundamentally the wrong place to do anything.
OTOH, you are right in that rwsem_mark_wake() is in a special position;
it *KNOWS* the reader count has hit 0 and can ignore future spurious
increments because by knowing it was 0 it knows they *WILL* fail and
queue (provided WAITERS||HANDOFF) -- but I've not seen you
articulate this point.
(note how rwsem_cond_wake_waiter() relies on exactly this to select
WAKE_ANY)
You are also right in that having these different means of waking
readers and writers is 'confusing'.
But you failed to take things to their natural conclusion -- change the
way we do writer wakeups, all writer wakeups.
So change rwsem_mark_wake()'s writer wakeup and
rwsem_down_write_slowpath() to always so that waiter->task thing that
the readers already do.
It means putting all the logic for waking writers into
rwsem_mark_wake(), but then we can make full use of this 'we hit zero
future readers are temporary noise'. Althought I suspect you might need
to pass count into it, so we can observe the flags at the time 0 was
observed.
Is all that making sense? -- it has been a long day :-)
Hmm, clearly I missed something. This is rwsem_try_write_lock() settingAlso; and I asked this last time around; why do we care about theHANDOFF can happen for both readers and writers. Handoff to writer is
handoff to writer *at*all* ? It is the readers that set HANDOFF.
actually more important than to readers.
it, right?