Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic

From: Dmitry Vyukov
Date: Mon Apr 09 2018 - 07:50:28 EST

Next message: Tobias Regnery: "Re: [PATCH] usb: typec: ucsi: fix tracepoint related build error"
Previous message: Peter Zijlstra: "Re: [RFC PATCH 3/6] sched: Add over-utilization/tipping point indicator"
In reply to: Tetsuo Handa: "Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Apr 9, 2018 at 1:13 PM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>> > It will be nice if syzbot testing is done with kdump configured, and the
>> > result of automated scripting on vmcore (such as "foreach bt -s -l") is
>> > available.
>>
>> kdump's popped up several times already
>> (https://github.com/google/syzkaller/issues/491). But this will
>> require some non-trivial amount of work to pipe it through the whole
>> system (starting from investigation/testing, second kernel to storing
>> them and exposing).
>>
>
> We can use different kernels for testing and kdump, can't we? Then,
> I think it is not difficult to load kernel for kdump from local disk.
> And kdump (kexec-tools) already supports dumping via ssh. Then, is there
> still non-trivial amount of work? Just a remote server for temporarily
> holding kernel for testing and run scripted analyzing commands ?

It's just that usually fully automating something is much larger
amount of work than doing it manually as a one-off thing. I also need
to figure out how much time and space it takes to reboot into kdump
kernel and extract the dump. I don't think that it's feasible to
persistently store all kdumps, because we are getting ~1 crash/sec.
Then, the web server that serves syzbot UI and sends emails is an
Appengine web app which does not have direct access to test machines
and/or git, but it seems that only it can decide when we need to store
dumps persistently. In the current architecture test machines are
disposable and are long gone by the time crash it uploaded to
dashboard. So machines needs to be preserved until after dashboard
says if we need dump or not. Or maybe extract dumps always and store
them locally temporary until we know if we need to persist it or not.
I don't know yet what will work better. This also needs to be
carefully treated through crash reproduction process which has
different logic from main testing loop. And at the end interfaces
between multiple systems need to be extended, database format needs to
be extended, lots of testing done, and we need to figure out what is a
good config for kdump kernel and image build process needs to be
extended to package kdump kernel, configs of multiple systems need to
be extended and probably a bunch of other small things here and there.
Then we also need vmlinux to make dumps actionable, right? And vmlinux
is nice in itself because it allows to do objdump -d. So it probably
makes sense to separate vmlinux uploading and persistance from dumps,
because vmlinux'es probably better be uploaded once per kernel build
(which is like once per day). So that will be separate paths through
the system.
Also probably makes sense to consider if
https://github.com/google/syzkaller/issues/466 can be bundled with
this work (at least data paths, what exactly is captured can of course
be extended later).
We also need to figure out if at least part of all this can be
unit-tested and write tests.
So, yes, nothing extraordinary. But I feel this is not doable within a
day and will preferably require several uninterrupted days with
nothing else urgent, but I am having troubles with such days lately...

Next message: Tobias Regnery: "Re: [PATCH] usb: typec: ucsi: fix tracepoint related build error"
Previous message: Peter Zijlstra: "Re: [RFC PATCH 3/6] sched: Add over-utilization/tipping point indicator"
In reply to: Tetsuo Handa: "Re: [PATCH v2] locking/hung_task: Show all hung tasks before panic"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]