Re: possible deadlock in start_this_handle (2)

From: Michal Hocko
Date: Fri Feb 12 2021 - 10:44:20 EST


On Fri 12-02-21 21:58:15, Tetsuo Handa wrote:
> On 2021/02/12 21:30, Michal Hocko wrote:
> > On Fri 12-02-21 12:22:07, Matthew Wilcox wrote:
> >> On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote:
> >>> On 2021/02/12 1:41, Michal Hocko wrote:
> >>>> But I suspect we have drifted away from the original issue. I thought
> >>>> that a simple check would help us narrow down this particular case and
> >>>> somebody messing up from the IRQ context didn't sound like a completely
> >>>> off.
> >>>>
> >>>
> >>> From my experience at https://lkml.kernel.org/r/201409192053.IHJ35462.JLOMOSOFFVtQFH@xxxxxxxxxxxxxxxxxxx ,
> >>> I think we can replace direct PF_* manipulation with macros which do not receive "struct task_struct *" argument.
> >>> Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating PFA_* flags on a remote thread, we can
> >>> define similar ones for manipulating PF_* flags on current thread. Then, auditing dangerous users becomes easier.
> >>
> >> No, nobody is manipulating another task's GFP flags.
> >
> > Agreed. And nobody should be manipulating PF flags on remote tasks
> > either.
> >
>
> No. You are misunderstanding. The bug report above is an example of
> manipulating PF flags on remote tasks.

The bug report you are referring to is ancient. And the cpuset code
doesn't touch task->flags for a long time. I haven't checked exactly but
it is years since regular and atomic flags have been separated unless I
misremember.

> You say "nobody should", but the reality is "there indeed was". There
> might be unnoticed others. The point of this proposal is to make it
> possible to "find such unnoticed users who are manipulating PF flags
> on remote tasks".

I am really confused what you are proposing here TBH and referring to an
ancient bug doesn't really help. task->flags are _explicitly_ documented
to be only used for _current_. Is it possible that somebody writes a
buggy code? Sure, should we build a whole infrastructure around that to
catch such a broken code? I am not really sure. One bug 6 years ago
doesn't sound like a good reason for that.

--
Michal Hocko
SUSE Labs