Re: workqueue: race in mod_delayed_work_on?

From: Konstantin Khlebnikov
Date: Tue May 10 2016 - 13:20:31 EST


On 10.05.2016 19:36, Tejun Heo wrote:
Hello,

On Tue, May 10, 2016 at 07:28:08PM +0300, Konstantin Khlebnikov wrote:
On 10.05.2016 11:21, Konstantin Khlebnikov wrote:
I've got plenty warnings, bugs and oops around trivial use of mod_delayed_work in drivers/infiniband/core/addr.c

Looks like problem in mod_delayed_work_on was hidden because add_timer is equal to mod_timer

The timer usages are gated behind PENDING bit, so whether add_timer()
is equal to mod_timer() shouldn't matter.

Hmm... this looks little bit more complicated than one bit.


but Sasha accidentally backported 874bbfe600a660cba9c776b3957b1ce393151b76
(workqueue: make sure delayed work run in local cpu) into 3.18.25

I don't see reason why that commit could break delayed work,
most likely it highlighted some other problem.

What are you running? Can you reproduce the issue on upstream kernel?


This is slight patched 3.18.y. Looks like this started when we upgraded kernel to 3.18.25 and
somebody have loaded module ib_addr (ip in infiniband or something) which actually unused
because these machines have no infiniband at all. But this code is poked from ethernet arp
sometimes. So, it crashes somewhere from time to time. I'll try to stresstest this piece.

--
Konstantin