Re: [PATCH AUTOSEL 5.11 43/44] signal: don't allow STOP on PF_IO_WORKER threads

From: Jens Axboe
Date: Thu Mar 25 2021 - 11:11:02 EST


On 3/25/21 8:02 AM, Jens Axboe wrote:
> On 3/25/21 7:56 AM, Stefan Metzmacher wrote:
>> Am 25.03.21 um 14:38 schrieb Jens Axboe:
>>> On 3/25/21 6:11 AM, Stefan Metzmacher wrote:
>>>>
>>>> Am 25.03.21 um 13:04 schrieb Eric W. Biederman:
>>>>> Stefan Metzmacher <metze@xxxxxxxxx> writes:
>>>>>
>>>>>> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>>>>>>> From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>>>>>>
>>>>>>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>>>>>>
>>>>>>> Just like we don't allow normal signals to IO threads, don't deliver a
>>>>>>> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
>>>>>>> signals in general, and have no means of flushing out a stop either.
>>>>>>>
>>>>>>> Longer term, we may want to look into allowing stop of these threads,
>>>>>>> as it relates to eg process freezing. For now, this prevents a spin
>>>>>>> issue if a SIGSTOP is delivered to the parent task.
>>>>>>>
>>>>>>> Reported-by: Stefan Metzmacher <metze@xxxxxxxxx>
>>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>>>>>> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> kernel/signal.c | 3 ++-
>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>>>>> index 55526b941011..00a3840f6037 100644
>>>>>>> --- a/kernel/signal.c
>>>>>>> +++ b/kernel/signal.c
>>>>>>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
>>>>>>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
>>>>>>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
>>>>>>>
>>>>>>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
>>>>>>> + if (unlikely(fatal_signal_pending(task) ||
>>>>>>> + (task->flags & (PF_EXITING | PF_IO_WORKER))))
>>>>>>> return false;
>>>>>>>
>>>>>>> if (mask & JOBCTL_STOP_SIGMASK)
>>>>>>>
>>>>>>
>>>>>> Again, why is this proposed for 5.11 and 5.10 already?
>>>>>
>>>>> Has the bit about the io worker kthreads been backported?
>>>>> If so this isn't horrible. If not this is nonsense.
>>>
>>> No not yet - my plan is to do that, but not until we're 100% satisfied
>>> with it.
>>
>> Do you understand why the patches where autoselected for 5.11 and 5.10?
>
> As far as I know, selections like these (AUTOSEL) are done by some bot
> that uses whatever criteria to see if they should be applied for earlier
> revisions. I'm sure Sasha can expand on that :-)
>
> Hence it's reasonable to expect that sometimes it'll pick patches that
> should not go into stable, at least not just yet. It's important to
> understand that this message is just a notice that it's queued up for
> stable -rc, not that it's _in_ stable just yet. There's time to object.
>
>>>> I don't know, I hope not...
>>>>
>>>> But I just tested v5.12-rc4 and attaching to
>>>> an application with iothreads with gdb is still not possible,
>>>> it still loops forever trying to attach to the iothreads.
>>>
>>> I do see the looping, gdb apparently doesn't give up when it gets
>>> -EPERM trying to attach to the threads. Which isn't really a kernel
>>> thing, but:
>>
>> Maybe we need to remove the iothreads from /proc/pid/tasks/
>
> Is that how it finds them? It's arguably a bug in gdb that it just
> keeps retrying, but it would be nice if we can ensure that it just
> ignores them. Because if gdb triggers something like that, probably
> others too...
>
>>>> And I tested 'kill -9 $pidofiothread', and it feezed the whole
>>>> machine...
>>>
>>> that sounds very strange, I haven't seen anything like that running
>>> the exact same scenario.
>>>
>>>> So there's still work to do in order to get 5.12 stable.
>>>>
>>>> I'm short on time currently, but I hope to send more details soon.
>>>
>>> Thanks! I'll play with it this morning and see if I can provoke
>>> something odd related to STOP/attach.
>>
>> Thanks!
>>
>> Somehow I have the impression that your same_thread_group_account patch
>> may fix a lot of things...
>
> Maybe? I'll look closer.

It needs a bit more love than that. If you have threads already in your
app, then we just want to skip over the PF_IO_WORKER threads. We can't
just terminate the loop.

Something like the below works for me.


diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3851bfcdba56..abff2fe10bfa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3723,7 +3723,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
*/
pos = task = task->group_leader;
do {
- if (!nr--)
+ if (same_thread_group(task, pos) && !nr--)
goto found;
} while_each_thread(task, pos);
fail:
@@ -3744,16 +3744,22 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
*/
static struct task_struct *next_tid(struct task_struct *start)
{
- struct task_struct *pos = NULL;
+ struct task_struct *tmp, *pos = NULL;
+
rcu_read_lock();
- if (pid_alive(start)) {
- pos = next_thread(start);
- if (thread_group_leader(pos))
- pos = NULL;
- else
- get_task_struct(pos);
+ if (!pid_alive(start))
+ goto no_thread;
+ list_for_each_entry_rcu(tmp, &start->thread_group, thread_group) {
+ if (!thread_group_leader(tmp) && same_thread_group(start, tmp)) {
+ get_task_struct(tmp);
+ pos = tmp;
+ break;
+ }
}
+no_thread:
rcu_read_unlock();
+ if (!pos)
+ return NULL;
put_task_struct(start);
return pos;
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3f6a0fcaa10c..4f621e386abf 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -668,11 +668,18 @@ static inline bool thread_group_leader(struct task_struct *p)
}

static inline
-bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
{
return p1->signal == p2->signal;
}

+static inline
+bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+{
+ return same_thread_group_account(p1, p2) &&
+ !((p1->flags | p2->flags) & PF_IO_WORKER);
+}
+
static inline struct task_struct *next_thread(const struct task_struct *p)
{
return list_entry_rcu(p->thread_group.next,
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f611658eeab..625110cacc2a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
* those pending times and rely only on values updated on tick or
* other scheduler action.
*/
- if (same_thread_group(current, tsk))
+ if (same_thread_group_account(current, tsk))
(void) task_sched_runtime(current);

rcu_read_lock();

--
Jens Axboe