Re: [PATCH 3/4 v0.4] sched/umcg: add Documentation/userspace-api/umcg.rst
From: Peter Oskolkov
Date: Fri Aug 06 2021 - 13:25:18 EST
On Fri, Aug 6, 2021 at 9:52 AM Thierry Delisle <tdelisle@xxxxxxxxxxxx> wrote:
>
> > All _umcg_ state changes here happen in the userspace before
> > sys_umcg_wait() is called. So:
> >
> > W1: cmpxchg W1:RUNNING => W1:IDLE
> > - if OK, call sys_umcg_wait()
> > - if failed, do something else (notes below)
> >
> > W2: cmpxchg W1:IDLE => W1:RUNNING
> > - if OK, lock itself, set W2:next_tid to W1, call sys_umcg_wait()
> > (will not block nor spin), restore next_tid and state/unlock upon
> > syscall return
> > - if failed, do something else
>
> I am talking about the case where both cmpxchg() succeed and W2 sets
> *both* UMCG_WAIT_WAKE_ONLY and UMCG_WAIT_WF_CURRENT_CPU. More
> specifically, if things are ordered like so: (ideally use monospace font)
>
> - w1 - - w2 -
>
> w1:RUNNING => w1:IDLE|L |
> S:IDLE => S:RUNNING |
> sys_umcg_wait(): |
> set ~UMCG_TF_LOCKED |
> | w1:IDLE => w1:RUNNING|L
> | w2:RUNNING => w2:IDLE|L
> | w2:next_tid := W1.tid
> | w1:RUNNING|L => w1:RUNNING
> | sys_umcg_wait():
> | set ~UMCG_TF_LOCKED
> | do_context_switch()
> idle_loop() |
>
> What does ttwu() do with w1 if it has not reached idle_loop yet?
If both cmpxchg() succeeded, but W1 was never put to sleep, ttwu()
will do nothing and W1 will continue running on its initial CPU, while
W2 will continue running on its own CPU. WF_CURRENT_CPU is an advisory
flag, and in this situation it will not do anything.
>
> I am not familiar with the details of kernel context-switching, but in
> user-level threading, more specifically Cforall, this would be a problem.
> Between the call to do_context_switch() and and idle_loop(), there is a
> window where 2 CPUs run the same thread at the same time. One thread is
> performing the front side of the context switch and the other threads
> wakes up on the backside of the context switch. This behaviour invariably
> corrupts the program stack of that thread. Again, I do not know if that
> applies here. I am not exactly sure how the program stack is handled when
> inside a system call.
This is a wake, not a context switch, right? I'm not sure why you are
concerned with context switching here. And even if it were a context
switch, the kernel manages thread stacks properly, there's nothing to
worry about.
Am I missing something?