Re: [RFC 00/60] Coscheduling for Linux

From: Subhra Mazumdar
Date: Mon Oct 29 2018 - 18:52:50 EST

On 10/26/18 4:44 PM, Jan H. SchÃnherr wrote:
On 19/10/2018 02.26, Subhra Mazumdar wrote:
Hi Jan,
Hi. Sorry for the delay.

On 9/7/18 2:39 PM, Jan H. SchÃnherr wrote:
The collective context switch from one coscheduled set of tasks to another
-- while fast -- is not atomic. If a use-case needs the absolute guarantee
that all tasks of the previous set have stopped executing before any task
of the next set starts executing, an additional hand-shake/barrier needs to
be added.

Do you know how much is the delay? i.e what is overlap time when a thread
of new group starts executing on one HT while there is still thread of
another group running on the other HT?
The delay is roughly equivalent to the IPI latency, if we're just talking
about coscheduling at SMT level: one sibling decides to schedule another
group, sends an IPI to the other sibling(s), and may already start
executing a task of that other group, before the IPI is received on the
other end.
Can you point to where the leader is sending the IPI to other siblings?

I did some experiment and delay seems to be sub microsec. I ran 2 threads
that are just looping in one cosched group and affinitized to the 2 HTs of
a core. And another thread in a different cosched group starts running
affinitized to the first HT of the same core. I time stamped just before
context_switch() in __schedule() for the threads switching from one to
another and one to idle. Following is what I get on cpu 1 and 45 that are
siblings, cpu 1 is where the other thread preempts:

[Â 403.216625] cpu:45 sub1->idle:403216624579
[Â 403.238623] cpu:1 sub1->sub2:403238621585
[Â 403.238624] cpu:45 sub1->idle:403238621787
[Â 403.260619] cpu:1 sub1->sub2:403260619182
[Â 403.260620] cpu:45 sub1->idle:403260619413
[Â 403.282617] cpu:1 sub1->sub2:403282617157
[Â 403.282618] cpu:45 sub1->idle:403282617317

Not sure why the first switch on cpu to idle happened. But then onwards
the difference in timestamps is less than a microsec. This is just a crude
way to get a sense of the delay, may not be exact.


Now, there are some things that may delay processing an IPI, but in those
cases the target CPU isn't executing user code.

I've yet to produce some current numbers for SMT-only coscheduling. An
older ballpark number I have is about 2 microseconds for the collective
context switch of one hierarchy level, but take that with a grain of salt.