Re: bpf: BPF_PROG_TEST_RUN leads to unkillable process

From: Stanislav Fomichev
Date: Mon Feb 04 2019 - 12:49:03 EST


On 02/01, Dmitry Vyukov wrote:
> Hello,
>
> The following program leads to an unkillable process that eats CPU in
> an infinite loop in BPF_PROG_TEST_RUN syscall. But kernel does not
> self-detect cpu/rcu/task stalls either. The program contains max
> number of repetitions, but as far as I see the intention is that it
> should be killable. I see that bpf_test_run() checks for
> signal_pending(current), but it does so only if need_resched() is also
> set. Can need_resched() be not set for prolonged periods of time?
> /proc/pid/stack is empty, not sure what other info I can provide.
There is a bunch of places in the kernel where we do the same nested check:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c#n12059
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/hw_random/s390-trng.c#n80
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/random.c#n1049
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/s390/crypto/prng.c#n470

So it's not something unusual we do. OTOH, in the kernel/bpf/verifier.c
do_check() we do signal_pending() and need_resched() sequentially. In
theory, it should not hurt to do them in sequence. Any thoughts about
the patch below? I think we also need to properly return -ERESTARTSYS
when returning from signal_pending().

--