Re: [PATCH 5/5 v0.6] sched/umcg: add Documentation/userspace-api/umcg.txt

From: Thierry Delisle
Date: Tue Oct 12 2021 - 14:54:40 EST


>> Just to be clear, sys_umcg_wait supports an operation that, when called
>> from a worker, puts the worker to sleep without triggering block detection
>> or context-switching back to the server?
>
> Potentially, yes - when a worker wants to yield (e.g. as part of a
> custom UMCG-aware mutex/condvar code), and calls into the userspace
> scheduler, it may be faster to skip the server wakeup (e.g. reassign
> the server to another sleeping worker and wake this worker). This is
> not a supported operation right now, but I see how it could be used to
> optimize some things in the future.
>
> Do you have any concerns here?

To be honest, I did not realize this was a possibility until your previous
email. I'm not sure I buy your example, it just sounds like worker to worker
context-switching, but I could imagine "stop the world" cases or some "race
to idle" policy using this feature.

It seems to me the corresponding wake needs to know if it needs to enqueue
the worker into the idle workers list or if it should just schedule the worker
as it would a server.

How does the wake know which to do?



> I don't see a big difference here, sorry. We are  mixing levels of
> abstraction here again, I think. For example, the higher level
> userspace scheduling code will have more nuanced treatment of IDLE
> workers; but down at the kernel they are all the same: IDLE worker is
> a worker that the userspace can "schedule" by marking it RUNNING,
> regardless of whether the worker is "parked", or "woke from a blocking
> op", or whatever other semantically different state the worker can be.
> For the kernel, they are all the same, idle, not runnable, waiting for
> the userspace to explicitly "schedule" them.
>
> Similarly, I don't see a need to semantically distinguish "yield" from
> "park" at the kernel level of things; this distinction seems to be a
> higher-level abstraction that can be properly expressed in the
> userspace, and does not need to be explicitly addressed in the kernel
> (to make the code faster and simpler, for example).

From the kernel's perspective, I can see two distinct operation:

1 - Mark the worker as IDLE and put it to sleep.
2 - Mark the worker as IDLE, put it to sleep *and* immediately add it
    to the idle workers list.

The wait in operation 1 expects an outside wakeup call to match it and resume
the worker, while operation 2 is its own wakeup. To me that is the distinction
between wait/park and yield, respectively.

Is Operation 2 supported?

I'm not sure this distinction can be handled in userspace in all cases. Waking
oneself is generally not a possibility.