Re: [PATCH] mm: fix lazy vmap purging (use-after-free error)
From: Vegard Nossum
Date: Sat Feb 21 2009 - 04:31:20 EST
2009/2/21 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
> This is 4.1 milliseconds, so is quite plausible. Is the code -really-
> disabling preemption for 4.1 milliseconds?
>
>> [ 449.677407] __free_vmap_area(c7806ac0)
>> [ 449.680113] rcu_free_va(c7806a80)
>
> 5.4 milliseconds...
>
>> [ 449.682821] __free_vmap_area(c7806b00)
>> [ 449.684264] rcu_free_va(c7806ac0)
>
> 6.9 milliseconds...
>
>> [ 449.686525] __free_vmap_area(c7806b40)
>> [ 449.688205] rcu_free_va(c7806b00)
>
> 5.4 milliseconds...
>
>> ...and goes on for a long time, until something triggers this:
>>
>> [ 449.902253] rcu_free_va(c7839d00)
>> [ 449.903247] WARNING: kmemcheck: Caught 32-bit read from freed
>> memory (c7839d20)
>>
>> ...and finally:
>>
>> [ 457.580253] __purge_vmap_area_lazy() end
>> [ 457.581201] rcu_free_va(c78974c0)
>
> And I don't see the corresponding __free_vmap_area() for either of the
> above rcu_free_va() calls. Would you be willing to forward the
> timestamp for the __free_vmap_area() for c7839d20?
I'm sorry. The numbers in paranthesis are the struct vmap_area
pointers, not the actual parameters being passed to the function. So
when we have this:
[ 449.696775] __free_vmap_area(c7806c00)
[ 449.697274] rcu_free_va(c7806bc0)
[ 449.699543] __free_vmap_area(c7806c40)
[ 449.701104] rcu_free_va(c7806c00)
[ 449.703353] __free_vmap_area(c7806c80)
[ 449.704247] rcu_free_va(c7806c40)
rcu_free_va(c7806c00) corresponds to rcu_free_va(c7806c00).
>
>> So this is also what I meant by "immediately": The RCU callbacks are
>> getting called inside the loop, and they're almost always paired with
>> the list removal, or lagging one object behind.
>>
>> My guess is that this code posts "too many callbacks", which would
>> "force the grace period" according to __call_rcu() in
>> kernel/rcutree.c. What do you think about this?
>
> If the code really suppresses preemption across the whole loop, then
> any attempt to force the grace period should fail. Is it possible that
> preemption is momentarily enabled somewhere within the loop? Or that
> we are seeing multiple passes through the loop rather than one big long
> pass through the loop?
Multiple passes? No. We have a print-out at the beginning and at the
end, and there's nothing else happening in-between. It doesn't leave
the function __purge_vmap_area_lazy. I don't see preemption being
enabled anywhere in __free_vmap_area (or its calls).
I single-stepped __free_vmap_area, and it will get interrupted. I got
(among other things) this chain of calls:
do_IRQ -> handle_irq -> handle_level_irq -> handle_IRQ_event ->
timer_interrupt -> ... -> run_local_timers -> raise_softirq (nr=1)
And at one point it also calls into RCU machinery:
update_process_times (user_tick=0) at kernel/timer.c:1033
1033 if (rcu_pending(cpu))
(gdb)
rcu_pending (cpu=0) at kernel/rcutree.c:1288
1288 return __rcu_pending(&rcu_state, &per_cpu(rcu_data, cpu)) ||
...
update_process_times (user_tick=0) at kernel/timer.c:1034
1034 rcu_check_callbacks(cpu, user_tick);
(gdb)
rcu_check_callbacks (cpu=0, user=0) at kernel/rcutree.c:949
949 {
...
rcu_check_callbacks (cpu=0, user=-1049147360) at kernel/rcutree.c:967
967 rcu_qsctr_inc(cpu);
(gdb)
rcu_qsctr_inc () at include/linux/rcutree.h:253
253 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
(gdb)
255 rdp->passed_quiesc_completed = rdp->completed;
(gdb)
254 rdp->passed_quiesc = 1;
(gdb)
255 rdp->passed_quiesc_completed = rdp->completed;
(gdb)
rcu_check_callbacks (cpu=0, user=<value optimized out>) at kernel/rcutree.c:979
979 rcu_bh_qsctr_inc(cpu);
(gdb)
rcu_bh_qsctr_inc () at include/linux/rcutree.h:259
259 struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu);
(gdb)
261 rdp->passed_quiesc_completed = rdp->completed;
(gdb)
260 rdp->passed_quiesc = 1;
(gdb)
261 rdp->passed_quiesc_completed = rdp->completed;
(gdb)
rcu_check_callbacks (cpu=0, user=<value optimized out>) at kernel/rcutree.c:981
981 raise_softirq(RCU_SOFTIRQ);
(gdb)
raise_softirq (nr=8) at kernel/softirq.c:313
313 {
We also get into __rcu_check_callbacks after a while, with this stacktrace:
#0 do_IRQ (regs=0xc170ddc0) at arch/x86/kernel/irq.c:201
#1 <signal handler called>
#2 0xc1073b7a in __rcu_process_callbacks (rsp=0xc1677060, rdp=0xc1941320)
at kernel/rcutree.c:1127
#3 0xc1073dbf in rcu_process_callbacks (unused=<value optimized out>)
at kernel/rcutree.c:1162
#4 0xc103741f in __do_softirq () at kernel/softirq.c:198
#5 0xc10374fd in do_softirq () at kernel/softirq.c:244
#6 0xc1037655 in irq_exit () at kernel/softirq.c:281
#7 0xc100529f in do_IRQ (regs=0xc170de98) at arch/x86/kernel/irq.c:223
#8 <signal handler called>
#9 __free_vmap_area (va=0xc7803a40) at mm/vmalloc.c:411
#10 0xc1096dd9 in __purge_vmap_area_lazy (start=0xc170df10, end=0xc170df0c,
sync=<value optimized out>, force_flush=0) at mm/vmalloc.c:542
#11 0xc1096f1e in try_purge_vmap_area_lazy () at mm/vmalloc.c:556
#12 free_unmap_vmap_area_noflush (va=<value optimized out>) at mm/vmalloc.c:578
#13 0xc1096f4a in free_unmap_vmap_area () at mm/vmalloc.c:587
#14 remove_vm_area (addr=<value optimized out>) at mm/vmalloc.c:1168
#15 0xc1097005 in __vunmap (addr=0xc7803a40, deallocate_pages=0)
at mm/vmalloc.c:1194
#16 0xc10970be in vunmap (addr=0xc7803aa0) at mm/vmalloc.c:1253
#17 0xc1008ba5 in text_poke (addr=0xc127ba4f, opcode=0xc170df8f, len=1)
at arch/x86/kernel/alternative.c:523
#18 0xc1008ca9 in alternatives_smp_unlock (start=<value optimized out>,
end=0xc1713360, text=0xc1000000 "ï\206\021\002", text_end=0xc150250f "")
at arch/x86/kernel/alternative.c:252
#19 0xc171ef47 in alternative_instructions ()
at arch/x86/kernel/alternative.c:438
#20 0xc171f991 in check_bugs () at arch/x86/kernel/cpu/bugs.c:168
#21 0xc17148c5 in start_kernel () at init/main.c:687
#22 0xc171409e in i386_start_kernel () at arch/x86/kernel/head32.c:43
#23 0x00000000 in ?? ()
(On a side note, it strikes me that __do_softirq() does
local_irq_enable(), which means that __rcu_process_callbacks() can be
interrupted and the interrupt handler can call rcu_process_callbacks()
again...)
Does this ring any bells or make any sense at all? What else can I do
to help understand what's going on?
This, by the way, happens regardless of whether kmemcheck is used or
not. The only thing I do to trigger this particular behaviour is to
run an SMP kernel on a UP machine, since it will call
alternative_instructions() like we see in the stack trace above.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/