Re: [RFC PATCH] sched/debug: Use terse backtrace for idly sleeping threads.

From: Tetsuo Handa
Date: Sat Jul 21 2018 - 07:31:58 EST


On 2018/07/20 23:04, David Laight wrote:
> From: Tetsuo Handa
>> Sent: 20 July 2018 14:27
>>
>> On 2018/07/19 22:46, Peter Zijlstra wrote:
>>> On Thu, Jul 19, 2018 at 10:37:23PM +0900, Tetsuo Handa wrote:
>>>> This patch can be applied before proposing abovementioned changes.
>>>> Since there are many kernel threads whose backtrace is boring due to idly
>>>> waiting for an event inside the main loop, this patch introduces a kernel
>>>> config option (which allows SysRq-t to use one-liner backtrace for threads
>>>> idly waiting for an event) and simple helpers (which allow current thread
>>>> to declare that current thread is about to start/end idly waiting).
>
> A kernel config option isn't the right place to select this.
> Distros will build kernels with the 'wrong' value.

What do you mean? Distros can build their kernels with that config option disabled.
Are you suggesting runtime switching like /proc/sys/ or sysfs or debugfs ?

I'm using a syzbot specific kernel config option for testing under syzbot
(e.g. https://lore.kernel.org/lkml/9b9fcdda-c347-53ee-fdbb-8a7d11cf430e@xxxxxxxxxxxxxxxxxxx/T/#u ).
But I don't think that "using one-liner backtrace for threads idly waiting for
an event" has to be syzbot specific.

>
> In any case it is usually easier to read /proc/nnn/stack of the process
> you are interested it rather than write all of them to the kernel message
> buffer and find that it is far too small.

Reading /proc/$pid/stack is not an option for automated testing by syzbot.

syzbot currently has 65 hung task reports. Calling SysRq-l when khungtaskd
fired is still insufficient, and also analyzing vmcore is still impossible.
For syzbot, calling SysRq-t when khungtaskd fired will be helpful.

>
>>>> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
>>>> index f776807..6b8c8bd 100644
>>>> --- a/drivers/base/devtmpfs.c
>>>> +++ b/drivers/base/devtmpfs.c
>>>> @@ -406,7 +406,9 @@ static int devtmpfsd(void *p)
>>>> }
>>>> __set_current_state(TASK_INTERRUPTIBLE);
>>>> spin_unlock(&req_lock);
>>>> + start_idle_sleeping();
>>>> schedule();
>>>> + end_idle_sleeping();
>>>> }
>>>> return 0;
>>>> out:
>>>
>>> So I _really_ hate the idea of sprinking that all around the kernel like
>>> this.
>>>
>>
>> Does that comment mean the idea of "using one-liner backtrace for threads
>> idly waiting for an event" itself is OK?
>
> Aren't such stack traces likely to be short ones anyway?
> Either that or you actually want to know where it is really waiting.

Even if each stack is small, since size of console log needs to be limited,
I want to save lines where possible.

>
>> Since there already is schedule_idle() function, introducing idly_schedule()
>> etc. is very confusing. What I'm trying to do is to tell debug function that
>> "I'm currently in neutral situation and hence dumping my backtrace will not
>> give you interesting result". Since such section needs to be carefully
>> annotated with comments, I think that lockdep-like annotation fits better
>> than introducing wrapped functions.
>
> Or use extra bits of current->state set by set_current_state().

I didn't catch how we can use it. I worry that there is a risk of
unexpectedly overwritten because I don't think that the statement which
follows set_current_state() is always schedule*()/wait_event*() etc.