Re: [PATCH] mm: fix lazy vmap purging (use-after-free error)
From: Paul E. McKenney
Date: Sat Feb 21 2009 - 12:47:20 EST
On Sat, Feb 21, 2009 at 10:30:58AM +0100, Vegard Nossum wrote:
> 2009/2/21 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
> > This is 4.1 milliseconds, so is quite plausible. Is the code -really-
> > disabling preemption for 4.1 milliseconds?
> >
> >> [ 449.677407] __free_vmap_area(c7806ac0)
> >> [ 449.680113] rcu_free_va(c7806a80)
> >
> > 5.4 milliseconds...
> >
> >> [ 449.682821] __free_vmap_area(c7806b00)
> >> [ 449.684264] rcu_free_va(c7806ac0)
> >
> > 6.9 milliseconds...
> >
> >> [ 449.686525] __free_vmap_area(c7806b40)
> >> [ 449.688205] rcu_free_va(c7806b00)
> >
> > 5.4 milliseconds...
> >
> >> ...and goes on for a long time, until something triggers this:
> >>
> >> [ 449.902253] rcu_free_va(c7839d00)
> >> [ 449.903247] WARNING: kmemcheck: Caught 32-bit read from freed
> >> memory (c7839d20)
> >>
> >> ...and finally:
> >>
> >> [ 457.580253] __purge_vmap_area_lazy() end
> >> [ 457.581201] rcu_free_va(c78974c0)
> >
> > And I don't see the corresponding __free_vmap_area() for either of the
> > above rcu_free_va() calls. Would you be willing to forward the
> > timestamp for the __free_vmap_area() for c7839d20?
>
> I'm sorry. The numbers in paranthesis are the struct vmap_area
> pointers, not the actual parameters being passed to the function. So
> when we have this:
>
> [ 449.696775] __free_vmap_area(c7806c00)
> [ 449.697274] rcu_free_va(c7806bc0)
> [ 449.699543] __free_vmap_area(c7806c40)
> [ 449.701104] rcu_free_va(c7806c00)
> [ 449.703353] __free_vmap_area(c7806c80)
> [ 449.704247] rcu_free_va(c7806c40)
>
> rcu_free_va(c7806c00) corresponds to rcu_free_va(c7806c00).
OK, so 449.701104-449.696775=0.004329, or about four milliseconds,
correct?
> >> So this is also what I meant by "immediately": The RCU callbacks are
> >> getting called inside the loop, and they're almost always paired with
> >> the list removal, or lagging one object behind.
> >>
> >> My guess is that this code posts "too many callbacks", which would
> >> "force the grace period" according to __call_rcu() in
> >> kernel/rcutree.c. What do you think about this?
> >
> > If the code really suppresses preemption across the whole loop, then
> > any attempt to force the grace period should fail. Is it possible that
> > preemption is momentarily enabled somewhere within the loop? Or that
> > we are seeing multiple passes through the loop rather than one big long
> > pass through the loop?
>
> Multiple passes? No. We have a print-out at the beginning and at the
> end, and there's nothing else happening in-between. It doesn't leave
> the function __purge_vmap_area_lazy. I don't see preemption being
> enabled anywhere in __free_vmap_area (or its calls).
>
> I single-stepped __free_vmap_area, and it will get interrupted. I got
> (among other things) this chain of calls:
>
> do_IRQ -> handle_irq -> handle_level_irq -> handle_IRQ_event ->
> timer_interrupt -> ... -> run_local_timers -> raise_softirq (nr=1)
>
> And at one point it also calls into RCU machinery:
>
> update_process_times (user_tick=0) at kernel/timer.c:1033
> 1033 if (rcu_pending(cpu))
> (gdb)
> rcu_pending (cpu=0) at kernel/rcutree.c:1288
> 1288 return __rcu_pending(&rcu_state, &per_cpu(rcu_data, cpu)) ||
> ...
> update_process_times (user_tick=0) at kernel/timer.c:1034
> 1034 rcu_check_callbacks(cpu, user_tick);
> (gdb)
> rcu_check_callbacks (cpu=0, user=0) at kernel/rcutree.c:949
> 949 {
> ...
> rcu_check_callbacks (cpu=0, user=-1049147360) at kernel/rcutree.c:967
> 967 rcu_qsctr_inc(cpu);
???? Are the argument values trustworthy? If so, I don't see how
the variable user transitioned from zero to non-zero.
The value user!=0 tells RCU that we were interrupted from a user process,
but this immediately follows user==0. If we really were interrupted
from kernel code, (including from an irq handler) we should have user==0.
The user!=0 causes RCU to conclude that we are in a quiescent state.
RCU is then within its rights to process callbacks, which would result
in the behavior you saw.
> (gdb)
> rcu_qsctr_inc () at include/linux/rcutree.h:253
> 253 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
> (gdb)
> 255 rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> 254 rdp->passed_quiesc = 1;
> (gdb)
> 255 rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> rcu_check_callbacks (cpu=0, user=<value optimized out>) at kernel/rcutree.c:979
> 979 rcu_bh_qsctr_inc(cpu);
> (gdb)
> rcu_bh_qsctr_inc () at include/linux/rcutree.h:259
> 259 struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu);
> (gdb)
> 261 rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> 260 rdp->passed_quiesc = 1;
> (gdb)
> 261 rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> rcu_check_callbacks (cpu=0, user=<value optimized out>) at kernel/rcutree.c:981
> 981 raise_softirq(RCU_SOFTIRQ);
> (gdb)
> raise_softirq (nr=8) at kernel/softirq.c:313
> 313 {
>
> We also get into __rcu_check_callbacks after a while, with this stacktrace:
>
> #0 do_IRQ (regs=0xc170ddc0) at arch/x86/kernel/irq.c:201
> #1 <signal handler called>
> #2 0xc1073b7a in __rcu_process_callbacks (rsp=0xc1677060, rdp=0xc1941320)
> at kernel/rcutree.c:1127
> #3 0xc1073dbf in rcu_process_callbacks (unused=<value optimized out>)
> at kernel/rcutree.c:1162
> #4 0xc103741f in __do_softirq () at kernel/softirq.c:198
> #5 0xc10374fd in do_softirq () at kernel/softirq.c:244
> #6 0xc1037655 in irq_exit () at kernel/softirq.c:281
> #7 0xc100529f in do_IRQ (regs=0xc170de98) at arch/x86/kernel/irq.c:223
> #8 <signal handler called>
> #9 __free_vmap_area (va=0xc7803a40) at mm/vmalloc.c:411
> #10 0xc1096dd9 in __purge_vmap_area_lazy (start=0xc170df10, end=0xc170df0c,
> sync=<value optimized out>, force_flush=0) at mm/vmalloc.c:542
> #11 0xc1096f1e in try_purge_vmap_area_lazy () at mm/vmalloc.c:556
> #12 free_unmap_vmap_area_noflush (va=<value optimized out>) at mm/vmalloc.c:578
> #13 0xc1096f4a in free_unmap_vmap_area () at mm/vmalloc.c:587
> #14 remove_vm_area (addr=<value optimized out>) at mm/vmalloc.c:1168
> #15 0xc1097005 in __vunmap (addr=0xc7803a40, deallocate_pages=0)
> at mm/vmalloc.c:1194
> #16 0xc10970be in vunmap (addr=0xc7803aa0) at mm/vmalloc.c:1253
> #17 0xc1008ba5 in text_poke (addr=0xc127ba4f, opcode=0xc170df8f, len=1)
> at arch/x86/kernel/alternative.c:523
> #18 0xc1008ca9 in alternatives_smp_unlock (start=<value optimized out>,
> end=0xc1713360, text=0xc1000000 "ï\206\021\002", text_end=0xc150250f "")
> at arch/x86/kernel/alternative.c:252
> #19 0xc171ef47 in alternative_instructions ()
> at arch/x86/kernel/alternative.c:438
> #20 0xc171f991 in check_bugs () at arch/x86/kernel/cpu/bugs.c:168
> #21 0xc17148c5 in start_kernel () at init/main.c:687
> #22 0xc171409e in i386_start_kernel () at arch/x86/kernel/head32.c:43
> #23 0x00000000 in ?? ()
>
> (On a side note, it strikes me that __do_softirq() does
> local_irq_enable(), which means that __rcu_process_callbacks() can be
> interrupted and the interrupt handler can call rcu_process_callbacks()
> again...)
>
> Does this ring any bells or make any sense at all? What else can I do
> to help understand what's going on?
I will look into what might have caused rcu_check_callbacks()'s
"user" argument to suddenly become non-zero.
> This, by the way, happens regardless of whether kmemcheck is used or
> not. The only thing I do to trigger this particular behaviour is to
> run an SMP kernel on a UP machine, since it will call
> alternative_instructions() like we see in the stack trace above.
OK, good to know, thank you!
Is this behavior recent, or does it apply to earlier 2.6.29-rc kernels
as well? Does it happen with CONFIG_CLASSIC_RCU as well? (From the
trace above, I suspect that it might well do so, but if not, that will
be valuable information as well.)
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/