Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing timers

From: Guenter Roeck
Date: Fri Oct 28 2022 - 11:31:18 EST


On 10/27/22 15:58, Steven Rostedt wrote:
On Thu, 27 Oct 2022 15:24:04 -0700
Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:

Back in April, I posted an RFC patch set to help mitigate a common issue
where a timer gets armed just before it is freed, and when the timer
goes off, it crashes in the timer code without any evidence of who the
culprit was. I got side tracked and never finished up on that patch set.
Since this type of crash is still our #1 crash we are seeing in the field,
it has become a priority again to finish it.

This is v2 of that patch set. Thomas Gleixner posted an untested version
that makes timer->function NULL as the flag that it is shutdown. I took that
code, tested it (fixed it up), added more comments, and changed the
name to del_timer_shutdown() as Linus had asked. I also converted it to use
WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.

Here are various warnings and crashes. Complete logs are at

https://kerneltests.org/builders

in the "testing" column of the qemu test results.

This is with the published patch set plus the fixups in
timer_fixup_init() and timer_fixup_free().

Guenter

---
[ ... ]


WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38



The problem is that the tulip code calls timer_setup() repeatedly (and
unnecessarily). Apparently either the new timer code and/or the associated
ODEBUG code doesn't like that. The patch below fixes the problem.

I think there needs to be a means to handle that situation gracefully.
The parport code has the same problem (see second patch below), and
I am sure there are others.

Thanks,
Guenter

---
tulip:

diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c
index ecfad43df45a..0c86066929d3 100644
--- a/drivers/net/ethernet/dec/tulip/tulip_core.c
+++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
@@ -770,8 +770,6 @@ static void tulip_down (struct net_device *dev)

spin_unlock_irqrestore (&tp->lock, flags);

- timer_setup(&tp->timer, tulip_tbl[tp->chip_id].media_timer, 0);
-
dev->if_port = tp->saved_if_port;

/* Leave the driver in snooze, not sleep, mode. */
@@ -1869,10 +1867,14 @@ static int __maybe_unused tulip_resume(struct device *dev_d)
static void tulip_remove_one(struct pci_dev *pdev)
{
struct net_device *dev = pci_get_drvdata (pdev);
+ struct tulip_private *tp;

if (!dev)
return;

+ tp = netdev_priv(dev);
+ del_timer_shutdown(&tp->timer);
+
unregister_netdev(dev);
}


---
parport:

diff --git a/drivers/parport/ieee1284.c b/drivers/parport/ieee1284.c
index 4547ac44c8d4..50dbd2ea23fc 100644
--- a/drivers/parport/ieee1284.c
+++ b/drivers/parport/ieee1284.c
@@ -73,7 +73,7 @@ int parport_wait_event (struct parport *port, signed long timeout)
timer_setup(&port->timer, timeout_waiting_on_port, 0);
mod_timer(&port->timer, jiffies + timeout);
ret = down_interruptible (&port->physport->ieee1284.irq);
- if (!del_timer_sync(&port->timer) && !ret)
+ if (!del_timer_shutdown(&port->timer) && !ret)
/* Timed out. */
ret = 1;