Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in process_backlog
From: Andre Tomt (LKML)
Date: Mon Sep 28 2015 - 22:12:50 EST
On 26. sep. 2015 22:56, Greg Kroah-Hartman wrote:
4.1-stable review patch. If anyone has any objections, please let me know.
From: Julian Anastasov <ja@xxxxxx>
[ Upstream commit 2c17d27c36dcce2b6bf689f41a46b9e909877c21 ]
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
Several of our 4.1.9-rc1 running systems are experiencing hangs
requiring hardware/sysrq reset with this patch applied. Reverting it
fixes the hangs completely.
4.2 includes this patch as well but I have no such problems there.
4.2.2-rc1 works fine as well.
For now I think this patch should be reverted in 4.1.9.
The hangs have occured so far on Xen PV and KVM x86_64 virtual machines,
they will hang completely within minutes or hours depending on the type
of workload. The workloads are all fairly light, one running low traffic
email/antispam, another running monitoring and metrics of ~5 hosts and
one running a single terminal IRC client. All but the IRC one will hang
within a few minutes of booting.
When they lock up they only respond to sysrq, with ttyS0/hvc0 not
echoing anything typed in back, and are completely dead on the network.
One system managed to report rcu stalls but no backtraces (I'll look
over the debug config, if there is any interest).
My bare metal desktop has yet to be able to hit it, but it might be
entirely down to a different type of workload.
Something missing in 4.1?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/