I have sort of an inverse question to what Paolo asked :)
AFAIU we need the double-cancel because checking the flag and
scheduling are not atomic. But if we do that why the memory
barriers? They make it seem like we're doing something clever
with memory ordering, while really we're just depending on normal
properties of the tasklet/timer/work APIs.
FTR disable_work_sync() would work nicely here but it'd be
a PITA for backports.
Also - is this based on some report or syzbot? I'm a bit tempted
to put this in net-next given how unlikely the race is vs how
commonly used the driver is.