Re: INFO: rcu detected stall in sys_sendfile64 (2)

From: Eric Biggers
Date: Wed Mar 13 2019 - 19:40:49 EST


On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > Also, humans can sometimes find more simpler C reproducers from syzbot provided
> > reproducers. It would be nice if syzbot can accept and use a user defined C
> > reproducer for testing.
>
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.
>

Well, it's very tough and not many people are familiar with the syzkaller
codebase, let alone have time to contribute. But having simplified a lot of
the syzkaller reproducers manually, the main things I do are:

- Replace bare system calls with proper C library calls. For example:

#include <sys/syscall.h>

syscall(__NR_socket, 0xa, 6, 0);

becomes:

#include <sys/socket.h>

socket(AF_INET, SOCK_DCCP, 0);

- Do the same for structs. Use the appropriate C header rather than filling in
each struct manually. For example:

*(uint16_t*)0x20000000 = 0xa;
*(uint16_t*)0x20000002 = htobe16(0x4e20);
*(uint32_t*)0x20000004 = 0;
*(uint8_t*)0x20000008 = 0;
*(uint8_t*)0x20000009 = 0;
*(uint8_t*)0x2000000a = 0;
*(uint8_t*)0x2000000b = 0;
*(uint8_t*)0x2000000c = 0;
*(uint8_t*)0x2000000d = 0;
*(uint8_t*)0x2000000e = 0;
*(uint8_t*)0x2000000f = 0;
*(uint8_t*)0x20000010 = 0;
*(uint8_t*)0x20000011 = 0;
*(uint8_t*)0x20000012 = 0;
*(uint8_t*)0x20000013 = 0;
*(uint8_t*)0x20000014 = 0;
*(uint8_t*)0x20000015 = 0;
*(uint8_t*)0x20000016 = 0;
*(uint8_t*)0x20000017 = 0;
*(uint32_t*)0x20000018 = 0;

becomes:

struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_port = htobe16(0x4e20) };

- Put arguments on the stack rather than in a mmap'd region, if possible.

- Simplify any calls to the helper functions that syzkaller emits, e.g.
syz_open_dev(), syz_kvm_setup_vcpu(), or the networking setup stuff. Usually
the reproducer needs a small subset of the functionality to work.

- For multithreaded reproducers, try to incrementally simplify the threading
strategy. For example, reduce the number of threads by combining operations.
Also try running the operations in loops. Also, using fork() can often result
in a simpler reproducer than pthreads.

- Instead of using the 'r[]' array to hold all integer return values, give them
appropriate names.

- Remove duplicate #includes.

- Considering the actual kernel code and the bug, if possible find a different
way to trigger the same bug that's simpler or more reliable. If the problem
is obvious it may be possible to jump right to this step from the beginning.

Some gotchas:

- fault-nth injections are fragile, since the number of memory allocations in a
particular system call varies by kernel config and kernel version.
Incrementing n starting from 1 is more reliable.

- Some of the perf_event_open() reproducers are fragile because they hardcode a
trace event ID, which can change in every kernel version. Reading the trace
event ID from /sys/kernel/debug/tracing/events/ is more reliable.

- Reproducers using the KVM API sometimes only work on certain processors (e.g.
Intel but not AMD) or even depend on the host kernel.

- Reproducers that access the local filesystem sometimes assume that it's ext4.