RE: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning

From: David Laight
Date: Fri Nov 20 2020 - 12:37:36 EST


From: Waiman Long
> Sent: 20 November 2020 17:04
>
> On 11/20/20 8:11 AM, David Laight wrote:
> > From: Waiman Long
> >> Sent: 19 November 2020 18:40
> > ...
> >> My own testing also show not too much performance difference when
> >> removing reader spinning except in the lightly loaded cases.
> > I'm confused.
> >
> > I got massive performance improvements from changing a driver
> > we have to use mutex instead of the old semaphores (the driver
> > was written a long time ago).
> >
> > While these weren't 'rw' the same issue will apply.
> >
> > The problem was that the semaphore/mutex was typically only held over
> > a few instructions (eg to add an item to a list).
> > But with semaphore if you got contention the process always slept.
> > OTOH mutex spin 'for a while' before sleeping so the code rarely slept.
> >
> > So I really expect that readers need to spin (for a while) if
> > a rwsem (etc) is locked for writing.
> >
> > Clearly you really need a CBU (Crystal Ball Unit) to work out
> > whether to spin or not.
>
> That is the hard part. For short critical section and not many readers
> around, making the readers spin will likely improve performance. On the
> other hand, if the critical section is long with many readers, make
> readers sleep and then wake them all up at once can have better
> performance. There is no one-size-fit-all solution here.

Do the readers actually all wake up at the same time?
rwsem might be special, but if I cv_broadcast a userspace cv
then only one thread is woken.
Once it runs the next one is woken.
This is horrid if you actually want them all to run:
- It takes ages for the target cpu to come out of a low-power state.
- RT processes don't get scheduled if the cpu they last ran on is
'busy' in kernel.

I can't see why the number of readers is relevant.
They are more likely to start in 'lockstep' if they spin.
(Which I think is what you say is best).

You may want per-rwsem option of how long to spin.
Although there are probably only 2 useful values - 0 and lots.

Are there rw spinlocks?
They can be much better is the critical sections are short.
Especially if they really are short and RT kernels don't
break everything my making the sleep.

I was fixing some userspace code that does a lot of channels of
audio processing.
You can't afford to take a mutex because an interrupt might
come in while you hold it - stopping all the other threads
obtaining the same mutex.
One thread stopping is fine, but having all the threads stop
leads to processing overrun.
Since you can't disable interrupts in userspace (for a spinlock)
I had to replace locked linked lists with arrays indexed by
atomic counters.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)