commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

From: Cyril Hrubis
Date: Thu Jun 01 2017 - 10:01:06 EST


Hi!
I've started to wonder why is ping eating 100% CPU shortly after I've
upgraded my machine to 4.10 and here is what I found:

The ping main_loop() sleeps in poll() on its socket, the poll() usually times
out, at least that's what strace suggets which causes ping to sleep for ~1s in
the kernel.

See ping source at:

https://github.com/iputils/iputils/blob/master/ping_common.c#L587

The poll() seems to start returning POLLERR immediatelly after poll() is called
on the socket in a case that connection has dropped for a short while. It seems to be easily reproducible with:

* Starting ping with some ip address i.e. ping 4.2.2.2
* Letting it ping for a minute or so
* Disconnection a WAN cable from your AP
* After a minute or so ping ends up bussy looping on
poll() that returns with POLLERR immediatelly
* After plugging the cable back the problem gets only
worse since we now spend 99% of the time bussy looping
on the poll() syscall
* And my CPU fan starts to scream loudly

I've bisected the problem to this commit:

commit f5f99309fa7481f59a500f0d08f3379cd6424c1f (HEAD, refs/bisect/bad)
Author: Soheil Hassas Yeganeh <soheil@xxxxxxxxxx>
Date: Thu Nov 3 18:24:27 2016 -0400

sock: do not set sk_err in sock_dequeue_err_skb

--
Cyril Hrubis
chrubis@xxxxxxx