Re: [PATCH] batman-adv: Broken sync while rescheduling delayed work

From: Vlad Efanov
Date: Fri May 26 2023 - 13:35:32 EST


Sven,


cancel_delayed_work_sync() and queue_delayed_work()

use WORK_STRUCT_PENDING_BIT in work->data to synchronize.

INIT_DELAYED_WORK() clears this bit.


The situation is :  __cancel_work_timer() sets WORK_STRUCT_PENDING_BIT

but INIT_DELAYED_WORK() in batadv_dat_start_timer() clears it

and queue_delayed_work() schedules new work.


Best regards,

Vlad.

On 26.05.2023 19:49, Sven Eckelmann wrote:
On Friday, 26 May 2023 18:16:32 CEST Vladislav Efanov wrote:
The reason for these issues is the lack of synchronization. Delayed
work (batadv_dat_purge) schedules new timer/work while the device
is being deleted. As the result new timer/delayed work is set after
cancel_delayed_work_sync() was called. So after the device is freed
the timer list contains pointer to already freed memory.
You are most likely right but could you please point out what in the worker is
checked by the workqueue code that prevents it from being scheduled again?
(and which seems to be overwritten as your patch seems to suggest)

I think __cancel_work_timer marked the work as canceling but
batadv_dat_start_timer reinitialized the worked (thus removing this important
state). Would be nice if you could either correct me or confirm what I think to
remember.

Kind regards,
Sven