Re: INFO: task hung in filemap_fault

From: Dmitry Vyukov
Date: Mon Jan 15 2018 - 04:40:44 EST


On Mon, Jan 8, 2018 at 11:48 AM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> Dmitry Vyukov wrote:
>> >> Hi Tetsuo,
>> >>
>> >> syzbot always re-runs the same workload on a new machine. If it
>> >> manages to reproduce the problem, it provides a reproducer. In this
>> >> case it didn't.
>> >
>> > Even if it did not manage to reproduce the problem, showing raw.log in
>> > C format is helpful for me. For example,
>> >
>> > ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1)
>> >
>> > is confusing. 0x4c00 is not LOOP_CHANGE_FD but LOOP_SET_FD.
>> > If the message were
>> >
>> > ioctl(r3, 0x4c00, r1)
>> >
>> > more people will be able to read what the program tried to do.
>> > There are many operations done on loop devices, but are too hard
>> > for me to pick up only loop related actions.
>>
>>
>> Hi Tetsuo,
>>
>> The main purpose of this format is different, this is a complete
>> representation of programs that allows replaying them using syzkaller
>> tools.
>
> What is ioctl$LOOP_CHANGE_FD(r3, 0x4c00, r1) ?
> 0x4c00 is LOOP_SET_FD. Why LOOP_CHANGE_FD is there?


In short, it specifies exact discrimination of the syscall which
affects parsing of the rest of the arguments. For some syscalls
(ioctl/setsockopt/sendmsg) kernel has hundreds of different
discriminations with radically different arguments.
Now if you are asking why the discrimination is LOOP_CHANGE_FD, but
the actual command is LOOP_SET_FD, that's because this is a fuzzer,
it's sole purpose is to mess things in unexpected ways.


>> We can't simply drop info from there. Do you propose to add
>> another attached file that contains the same info in a different
>> format? What is the exact format you are proposing?
>
> Plain C program which can be compiled without installing additional
> program/library packages (except those needed for building kernels).
>
>> Is it just
>> dropping the syscall name part after $ sign? Note that it's still not
>> C, more complex syscall generally look as follows:
>>
>> perf_event_open(&(0x7f0000b5a000)={0x4000000002, 0x78, 0x1e2, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> @perf_bp={&(0x7f0000000000)=0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
>> recvmmsg(0xffffffffffffffff, &(0x7f0000003000)=[{{0x0, 0x0,
>> &(0x7f0000002000)=[{&(0x7f000000a000)=""/193, 0xc1},
>> {&(0x7f0000007000-0x3d)=""/61, 0x3d}], 0x2,
>> &(0x7f0000005000-0x67)=""/103, 0x67, 0x0}, 0x0}], 0x1, 0x0,
>> &(0x7f0000003000-0x10)={0x77359400, 0x0})
>> bpf$PROG_LOAD(0x5, &(0x7f0000000000)={0x1, 0x5,
>> &(0x7f0000002000)=@framed={{0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0}, [@jmp={0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}], {0x95, 0x0, 0x0,
>> 0x0}}, &(0x7f0000004000-0xa)='syzkaller\x00', 0x3, 0xc3,
>> &(0x7f0000386000)=""/195, 0x0, 0x0, [0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0], 0x0}, 0x48)
>>
>> Note: you can convert any syzkaller program to equivalent C code using
>> syz-prog2c utility that comes with syzkaller.
>
> I won't install go language into my environment for analyzing/reproducing your
> reports. If syz-prog2c is provided as a CGI service (e.g. receive URL containing
> raw.log and print the converted C program), I might try it.


raw.log is not a _program_, it's hundreds of separate programs that
were executed before the crash. It's also very compressed
representation as compared to equivalent C programs. For example for
this program:

mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32,
0xffffffffffffffff, 0x0)
r0 = socket$nl_generic(0x10, 0x3, 0x10)
sendmsg$nl_generic(r0,
&(0x7f0000b3e000-0x38)={&(0x7f0000d4a000-0xc)={0x10, 0x0, 0x0, 0x0},
0xc, &(0x7f0000007000)={&(0x7f0000f7c000-0x15c)={0x24, 0x1c, 0x109,
0xffffffffffffffff, 0xffffffffffffffff, {0x4, 0x0, 0x0},
[@nested={0x10, 0x9, [@typed={0xc, 0x0, @u32=0x0}]}]}, 0x24}, 0x1,
0x0, 0x0, 0x0}, 0x0)

you can get up to this amount of C code:
https://gist.githubusercontent.com/dvyukov/eeaeb4e4ac45c3a251f72098c9295bf9/raw/700cd583507eca90711ba11b42e406f317553371/gistfile1.txt

that is, 700 lines of C source for 3 line program. So instead of a 1MB
file that will be 100MB, and then it probably should be a gzip archive
with hundreds of separate C files. There are people on this list
complaining even about 200K of attachments. I don't see that this will
be better and well accepted.