Re: [PATCH v3] scheduler: enhancement to show_state_filter and SysRq

From: Yafang Shao
Date: Wed Aug 09 2017 - 22:44:52 EST


2017-08-10 0:42 GMT+08:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>:
> On Wed, Aug 09, 2017 at 05:26:14PM +0800, Yafang Shao wrote:
>> 2017-08-09 17:09 GMT+08:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>:
>> > On Wed, Aug 09, 2017 at 04:01:49PM +0800, Yafang Shao wrote:
>> >> 2017-08-09 15:43 GMT+08:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>:
>> >> > On Wed, Aug 09, 2017 at 06:31:28PM +0800, Yafang Shao wrote:
>> >> >> Sometimes we want to get tasks in TASK_RUNNING sepcifically,
>> >> >> instead of dump all tasks.
>> >> >>
>> >> >> For example, when the loadavg are high, we want to dump
>> >> >> tasks in TASK_RUNNING and TASK_UNINTERRUPTIBLE, which contribute
>> >> >> to system load. But mostly there're lots of tasks in Sleep state,
>> >> >> which occupies almost all of the kernel log buffer, even overflows
>> >> >> it, that causes the useful messages get lost. Although we can
>> >> >> enlarge the kernel log buffer, but that's not a good idea.
>> >> >
>> >> > That's what you have serial consoles for...
>> >> >
>> >> mostly we don't even have one console because we alwayas login the
>> >> servers via ssh. And manage the servers with console is not so convenient.
>> >
>> > I find IPMI SOL very useful. Serial console (esp. earlyprintk) keeps on
>> > working long after most other things have died.
>> >
>> > In any case, you can easily dump the printk output into your ssh session
>> > if you want, use something like:
>> >
>> > cat /dev/kmsg | tee logfile & echo t > /proc/sysrq-trigger
>>
>> that's what I'm doing it currently :)
>> Then I thought deeply why not do it more smartly?
>> Introducing a new key(here I just modified the key 'w') only dump
>> tasks in running and blocked should be more smarter.
>
> Since you're strictly ssh based, you could maybe do a sysctl that allows
> changing the 'default' filter of sysctl-t, dunno if that makes sense
> though.
>
> Also, since you're not actually debugging a dead machine, maybe you can
> do a custom kernel module / systemtap / ebpf thing that collects
> precisely the information you want.
>
> sysrq is typically a last ditch debug mostly dead machine thing, which
> you're very much not having.
>

Per my understanding, SysRq is a very old thing.
I agree with you that SysRq is implemented to debug dead machine. In
the old days, once the machine was dead, we could press the SysRq key
on the keyboad to help us collect the information then analyze and
resovle it. That's great.
But things change now.
Nowdayes, tens of thousands of servers running in IDC without keyboad
nor screen, but I find this old thing still be the easiest way to
troubeshoot some kernel issues introduced by the applications. For
example, once there's sudden/random CPU %sys utilization spikes, or
suden/random system loadavg spikes, we could use /proc/sysrq-trigger
conveniently collecting the information in the kernel and analyze
what the issue is in the kernel.

Old things, new issues.

>> >
>> > I really see no problem here. Then you can run a bit of awk or whatever
>> > your favourite tool is to filter out the stuff you don't want.
>> >
>> Another question, if we could filter with scritpts in userland, why
>> did we introduced the key 'w' to dump only blocked state
>> as we already have a key 't' to dump all tasks ?
>
> No idea that is long before my time, I expect because 'w' (blocked) is
> typically a small number of tasks.

If the machine dead, there should not be many runnning tasks as well.

Thanks
Yafang