Re: Sum of weights idea for CFS PI

From: Qais Yousef
Date: Wed Oct 05 2022 - 05:31:41 EST

On 10/04/22 15:48, Joel Fernandes wrote:
> On 10/4/2022 12:30 PM, Qais Yousef wrote:
> > On 10/03/22 12:27, Joel Fernandes wrote:
> >> There's a lot to unwind so I will reply in pieces after spending some time
> >> thinking about it, but just for this part:
> >>
> >> On 10/3/2022 12:14 PM, Qais Yousef wrote:
> >>>> In this case, there is no lock involved yet you have a dependency. But I don't
> >>>> mean to sound depressing, and just because there are cases like this does not
> >>>> mean we should not solve the lock-based ones. When I looked at Android, I saw
> >>>> that it uses futex directly from Android Runtime code instead of using pthread.
> >>>> So perhaps this can be trivially converted to FUTEX_LOCK_PI and then what we do
> >>>> in the kernel will JustWork(Tm) ?
> >>> I guess it will depend on individual libc implementation, but I thought all of
> >>> them use FUTEX under the hood for pthreads mutexes.
> >>>
> >>> Maybe we can add a bootparam to force all futexes to be FUTEX_LOCK_PI?
> >>>
> >>
> >> In the case of FUTEX_LOCK_PI, you have to store the TID of the 'lock owner' in
> >> the futex word to signify that lock is held.
> >
> > Right. So userspace has to opt-in.
> >
> >> That wont work for the case above, Producer/Consumer signalling each other on a
> >> bounded-buffer, right? That's not locking even though it is acquiring and
> >> release of a limited resource.
> >
> > Yes but as I tried to point out I don't think proxy-execution handles this case
> > where you don't hold a lock explicitly. But I could be wrong.
> I don't disagree. Proxy execution is an implementation detail, without more
> information from userspace, any implementation cannot help. I was just
> responding to your point about converting all futexes which you cannot do
> without knowing what the futex is used for.


I don't think I read much on literature on priority inversion caused by waiting
on signals. I need to research that.

I think it is considered a voluntary sleep and sane system design should ensure
both of these tasks priorities don't lead to starvation based on expected rate
of producer/consumer.

It doesn't seem to be a problem for PREEMPT_RT since no body has done anything
about it AFAICT?

It could be the fact that in CFS priority is weights (or bandwidth) and this
introduces this new class of problems. I think we should still ask the question
if the priority assignment is wrong when this happens. If there's a clear
relationship between producer/consumer, should they have the same priority if
they do equal amount of work?

> But I am thinking of messing around with rt_mutex_setprio() and some userspace
> tests to see if I can make the sum of weights thing work for the *userspace
> locking* usecases (FUTEX_LOCK_PI). Then run some tests and collect some traces.
> Perhaps you can do that on the Android side as well.

I'd be happy to help, yes :)

In my view, the trickiest part would be is how to account for the stolen time.
If C gets 3/5 share and runs for 2/5 before releasing the lock, then when
A wakes up, it should perceive that it ran for 1/5 (time C stole from A while
holding the lock) and run only for 1/5 before getting preempted. To preserve
its 2/5 share. That is IF we want to be very accurate.

If the 2 tasks are not on the same rq, I think that will not change how things
are done a lot..

> > IIUC Sebastian's
> > understanding is similar to mine. Only 'locks' (FUTEX_LOCK_PI which ends up
> > using rt-mutex) do PI inheritance.
> >
> > So this signaling scenario is a new class of problems that wasn't handled
> > before; to my understanding.
> Most certainly, agreed.

Sorry I am thinking out loud for most/all part of my reply :)


Qais Yousef