Re: [PATCH] posix-timer: fix deletion race

From: Jeremy Katz
Date: Tue Jul 17 2007 - 19:14:05 EST


On Tue, 17 Jul 2007, Thomas Gleixner wrote:

On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote:
I tried the patch with my test case, but still see the issue.
Here's my explanation of the double free race:
CPU 0 CPU 1
sys_timer_delete():
lock_timer();
...
unlock_timer(); itimer_delete()
release_posix_timer(): spin_lock_irqsave(timer...)
... ...
sigqueue_free() unlock_timer()
... sigqueue_free()
BUG_ON(!(q->flags...

With 2.6.14 or with current mainline ?

I haven't been keeping notes quite as studiously as I should have been, but this just occurred with 2.6.22.1 + the hrt6 patch + your proposed fix:

------------[ cut here ]------------
Kernel BUG at c0125987 [verbose debug info unavailable]
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c0125987>] Not tainted VLI
EFLAGS: 00010246 (2.6.22-hrt1-WR1.4aq_cgl #1)
EIP is at sigqueue_free+0x23/0x72
eax: 00000000 ebx: f713a7d8 ecx: f79e25e0 edx: 00000202
esi: f6f4fa1c edi: 00000283 ebp: f7207e54 esp: f7207e4c
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process hrtm_test (pid: 19369, ti=f7207000 task=f729ea50 task.ti=f7207000)
Stack: 00000202 f6f4fa1c f7207e64 c012c7ca f6f4fa1c f6f4fa24 f7207e78 c012d110
f77463c0 f77463fc 00000001 f7207e88 c012d14b f77463c0 f729ea50 f7207eb4
c011e57c f729eea4 00000006 f729ea50 f7207ea4 c01244e8 00000006 f77463c0
Call Trace:
[<c0103504>] show_trace_log_lvl+0x1a/0x30
[<c01035bb>] show_stack_log_lvl+0x8d/0xaa
[<c01037f5>] show_registers+0x1cd/0x2cb
[<c0103a4a>] die+0x113/0x207
[<c0395cf5>] do_trap+0x8f/0xc6
[<c0103d35>] do_invalid_op+0x88/0x92
[<c0395ac2>] error_code+0x72/0x78
[<c012c7ca>] release_posix_timer+0x1b/0x7a
[<c012d110>] itimer_delete+0x9b/0xc0
[<c012d14b>] exit_itimers+0x16/0x21
[<c011e57c>] do_exit+0x300/0x3e8
[<c011e6b2>] do_group_exit+0x29/0x6f
[<c012643a>] get_signal_to_deliver+0x210/0x298
[<c010256d>] do_signal+0x5c/0x158
[<c01026a5>] do_notify_resume+0x3c/0x3f
[<c0102866>] work_notifysig+0x13/0x19

I doubt that this can happen, as itimer_delete() is only called, when
there are no more references to the shared sig struct. At this point no
other related thread can run sys_timer_delete().

I came to the above explanation after failing to find another path that clears the SIGQUEUE_PREALLOC flag. Some form of memory corruption could be to blame.

the timer fires during deletion:
CPU 0 CPU 1
sys_timer_delete():
lock_timer();
...
unlock_timer(); posix_timer_fn()
release_posix_timer(): spin_lock_irqsave(timer...)
... ...
sigqueue_free() posix_timer_event()
... ...
send_sigqueue()
BUG_ON(!(q->flags


I do not buy that either. In sys_timer_delete() the active timer is
canceled. If we can not cancel it then we retry until the timer callback
has been processed. This is even true for 2.6.14 (pre hrtimer).

--- linux-2.6.14-cgl.original/kernel/posix-timers.c 2007-07-11 16:06:47.000000000 -0700
+++ linux-2.6.14-cgl/kernel/posix-timers.c 2007-07-16 15:25:43.000000000 -0700
@@ -339,7 +339,9 @@ static int posix_timer_fn(void *data)
int si_private = 0;
int ret = HRTIMER_NORESTART;

- spin_lock_irqsave(&timr->it_lock, flags);
+ if(lock_timer(timr->it_id, &flags) == NULL) {
+ return ret;
+ }

This is wrong. You look at something which is not valid anymore, if your
theory is correct. You are just pampering over the real bug.

Noticed right after I sent the patch out. My goal was to validate and lock the timer through a single interface, but itimer_delete() and posix_timer_fn() currently recieve a k_itimer * instead of a timer_t. I should have reworked them.

@@ -835,7 +844,9 @@ static inline void itimer_delete(struct
unsigned long flags;

retry_delete:
- spin_lock_irqsave(&timer->it_lock, flags);
+ /* timer already deleted? */
+ if (lock_timer(timer->it_id, &flags) == NULL)
+ return;

Same here.

Thanks for the reply,
Jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/