Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing timers

From: Jason A. Donenfeld
Date: Thu Oct 27 2022 - 11:52:37 EST


On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
> We are hitting a common bug were a timer is being triggered after it is
> freed. This causes a corruption in the timer link list and crashes the
> kernel. Unfortunately it is not easy to know what timer it was that was
> freed. Looking at the code, it appears that there are several cases that
> del_timer() is used when del_timer_sync() should have been.
>
> Add a del_timer_free() that not only does a del_timer_sync() but will mark
> the timer as freed in case it gets rearmed, it will trigger a WARN_ON. The
> del_timer_free() is more likely to be used by developers that are about to
> free a timer, then using del_timer_sync() as the latter is not as obvious
> to being needed for freeing. Having the word "free" in the name of the
> function will hopefully help developers know that that function needs to
> be called before freeing.
>
> The added bonus is the marking of the timer as being freed such that it
> will trigger a warning if it gets rearmed. At least that way if the system
> crashes on a freed timer, at least we may see which timer it was that was
> freed.

FYI, there's a related issue with add_timer_on(), currently without a
straight forward solution, in case you're curious, discussed with
Sebastian and Sultan a few weeks ago. Pasting from that thread, the
issue is:

1 while (conditions) {
2 if (!timer_pending(&stack.timer))
3 add_timer_on(&stack.timer, some_next_cpu);
4 }
5 del_timer_sync(&stack.timer);

a) add_timer_on() on line 3 is called from CPU 1 and pends the timer on
CPU 2.

b) Just before the timer callback runs, not after, timer_pending() is
made false, so the condition on line 2 holds true again.

c) add_timer_on() on line 3 is called from CPU 1 and pends the timer on
CPU 3.

d) The conditions on line 1 are made false, and the loop breaks.

e) del_timer_sync() on line 5 is called, and its `base->running_timer !=
timer` check is false, because of step (c).

f) `stack.timer` gets freed / goes out of scope.

g) The callback scheduled from step (b) runs, and we have a UaF.

Here's a reproducer of this flow, which prints out:

[ 4.157610] wireguard: Stack on cpu 1 is corrupt

diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c
index ee4da9ab8013..5c61f49918f2 100644
--- a/drivers/net/wireguard/main.c
+++ b/drivers/net/wireguard/main.c
@@ -17,10 +17,40 @@
#include <linux/genetlink.h>
#include <net/rtnetlink.h>

+struct state {
+ struct timer_list timer;
+ char valid[8];
+};
+
+static void fire(struct timer_list *timer)
+{
+ struct state *stack = container_of(timer, struct state, timer);
+ mdelay(1000);
+ pr_err("Stack on cpu %d is %s\n", raw_smp_processor_id(), stack->valid);
+}
+
+static void do_the_thing(struct work_struct *work)
+{
+ struct state stack = { .valid = "valid" };
+ timer_setup_on_stack(&stack.timer, fire, 0);
+ stack.timer.expires = jiffies;
+ add_timer_on(&stack.timer, 1);
+ while (timer_pending(&stack.timer))
+ cpu_relax();
+ stack.timer.expires = jiffies;
+ add_timer_on(&stack.timer, 2);
+ del_timer_sync(&stack.timer);
+ memcpy(&stack.valid, "corrupt", 8);
+}
+
+static DECLARE_DELAYED_WORK(reproducer, do_the_thing);
+
static int __init wg_mod_init(void)
{
int ret;

+ schedule_delayed_work_on(0, &reproducer, HZ * 3);
+
ret = wg_allowedips_slab_init();
if (ret < 0)
goto err_allowedips;

It would be interesting if your patch fixed this case too. But maybe the
above is unfixable (and rather niche anyway).

Jason