Re: RCU can use cpu_active_map?

From: Paul E. McKenney
Date: Mon Feb 09 2009 - 16:39:43 EST


On Mon, Feb 09, 2009 at 01:13:45PM -0700, Alex Chiang wrote:
> Paul,
>
> I don't pretend to understand RCU, but a very quick and naive
> look through rcupreempt.c makes me think that we could use the
> cpu_active_map instead of cpu_online_map?
>
> cpu_active_map was introduced by e761b772.
>
> In the CPU hotplug path, we touch the cpu_active_map very early
> on:
>
> int __ref cpu_down(unsigned int cpu)
> {
> int err;
> err = stop_machine_create();
> if (err)
> return err;
> cpu_maps_update_begin();
>
> if (cpu_hotplug_disabled) {
> err = -EBUSY;
> goto out;
> }
>
> cpu_clear(cpu, cpu_active_map);
> /* ... */
> synchronize_sched();
> err = _cpu_down(cpu, 0);
> if (cpu_online(cpu))
> cpu_set(cpu, cpu_active_map);
>
> out:
> cpu_maps_update_done();
> stop_machine_destroy();
> return err;
> }
>
> The call to _cpu_down() is where we eventually get to the code
> that my patch below touches, so you can see that we mark the CPU
> as !active before we ever get to the step of migrating interrupts
> (which relies on cpu_online_map).
>
> If RCU looked at cpu_active_map instead of cpu_online_map, it
> seems like we would avoid the potential race situation you
> mentioned earlier.
>
> Alternatively, I could explore just playing with the ia64
> interrupt migration code to use cpu_active_mask instead, but
> wanted to get your thoughts from the RCU perspective.

Perhaps I am confused, but if the CPU is on its way down, doesn't RCU
need a mask where the CPU's bit stays set longer rather than shorter?

If I use cpu_active_mask, couldn't there be device interrupts during
(for example) the synchronize_sched(), which might have RCU read-side
critical sections that RCU needs to pay attention to?

Thanx, Paul

> Thanks.
>
> /ac
>
>
> * Alex Chiang <achiang@xxxxxx>:
> > This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> >
> > Commit e7b14036 removes the targetted disabled CPU from the
> > cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
> >
> > Paul McKenney states that the reasoning behind the patch was to
> > prevent irq handlers from running on CPUs marked offline because:
> >
> > RCU happily ignores CPUs that don't have their bits set in
> > cpu_online_map, so if there are RCU read-side critical sections
> > in the irq handlers being run, RCU will ignore them. If the
> > other CPUs were running, they might sequence through the RCU
> > state machine, which could result in data structures being
> > yanked out from under those irq handlers, which in turn could
> > result in oopses or worse.
> >
> > Unfortunately, both ia64 functions above look at cpu_online_map to find
> > a new CPU to migrate interrupts onto. This means we can potentially
> > migrate an interrupt off ourself back to... ourself. Uh oh.
> >
> > This causes an oops when we finally try to process pending interrupts on
> > the CPU we want to disable. The oops results from calling __do_IRQ with
> > a NULL pt_regs:
> >
> > Unable to handle kernel NULL pointer dereference (address 0000000000000040)
> > Call Trace:
> > [<a000000100016930>] show_stack+0x50/0xa0
> > sp=e0000009c922fa00 bsp=e0000009c92214d0
> > [<a0000001000171a0>] show_regs+0x820/0x860
> > sp=e0000009c922fbd0 bsp=e0000009c9221478
> > [<a00000010003c700>] die+0x1a0/0x2e0
> > sp=e0000009c922fbd0 bsp=e0000009c9221438
> > [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
> > sp=e0000009c922fbd0 bsp=e0000009c92213d8
> > [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
> > sp=e0000009c922fc60 bsp=e0000009c92213d8
> > [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
> > sp=e0000009c922fe30 bsp=e0000009c9221398
> > [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
> > sp=e0000009c922fe30 bsp=e0000009c9221330
> > [<a00000010013a800>] handle_IRQ_event+0x80/0x120
> > sp=e0000009c922fe30 bsp=e0000009c92212f8
> > [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
> > sp=e0000009c922fe30 bsp=e0000009c9221290
> > [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
> > sp=e0000009c922fe30 bsp=e0000009c9221208
> > [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
> > sp=e0000009c922fe30 bsp=e0000009c92211a8
> > [<a00000010005bd80>] __cpu_disable+0x140/0x240
> > sp=e0000009c922fe30 bsp=e0000009c9221168
> > [<a0000001006c5870>] take_cpu_down+0x50/0xa0
> > sp=e0000009c922fe30 bsp=e0000009c9221148
> > [<a000000100122610>] stop_cpu+0xd0/0x200
> > sp=e0000009c922fe30 bsp=e0000009c92210f0
> > [<a0000001000e0440>] kthread+0xc0/0x140
> > sp=e0000009c922fe30 bsp=e0000009c92210c8
> > [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
> > sp=e0000009c922fe30 bsp=e0000009c92210a0
> > [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
> > sp=e0000009c922fe30 bsp=e0000009c92210a0
> >
> > I don't like this revert because it is fragile. ia64 is getting lucky
> > because we seem to only ever process timer interrupts in this path, but
> > if we ever race with an IPI here, we definitely use RCU and have the
> > potential of hitting an oops that Paul describes above.
> >
> > Patching ia64's timer_interrupt() to check for NULL pt_regs is
> > insufficient though, as we still hit the above oops.
> >
> > As a short term solution, I do think that this revert is the right
> > answer. The revert hold up under repeated testing (24+ hour test runs)
> > with this setup:
> >
> > - 8-way rx6600
> > - randomly toggling CPU online/offline state every 2 seconds
> > - running CPU exercisers, memory hog, disk exercisers, and
> > network stressors
> > - average system load around ~160
> >
> > In the long term, we really need to figure out why we set pt_regs = NULL
> > in ia64_process_pending_intr(). If it turns out that it is unnecessary
> > to do so, then we could safely re-introduce e7b14036 (along with some
> > other logic to be smarter about migrating interrupts).
> >
> > One final note: x86 also removes the disabled CPU from cpu_online_map
> > and then re-enables interrupts for 1ms, presumably to handle any pending
> > interrupts:
> >
> > arch/x86/kernel/irq_32.c (and irq_64.c):
> > cpu_disable_common:
> > [remove cpu from cpu_online_map]
> >
> > fixup_irqs():
> > for_each_irq:
> > [break CPU affinities]
> >
> > local_irq_enable();
> > mdelay(1);
> > local_irq_disable();
> >
> > So they are doing implicitly what ia64 is doing explicitly.
> >
> > Cc: stable@xxxxxxxxxx
> > Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Alex Chiang <achiang@xxxxxx>
> > ---
> > arch/ia64/kernel/smpboot.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
> > index 1146399..2ec5bbf 100644
> > --- a/arch/ia64/kernel/smpboot.c
> > +++ b/arch/ia64/kernel/smpboot.c
> > @@ -736,14 +736,16 @@ int __cpu_disable(void)
> > return -EBUSY;
> > }
> >
> > + cpu_clear(cpu, cpu_online_map);
> > +
> > if (migrate_platform_irqs(cpu)) {
> > cpu_set(cpu, cpu_online_map);
> > return (-EBUSY);
> > }
> >
> > remove_siblinginfo(cpu);
> > - fixup_irqs();
> > cpu_clear(cpu, cpu_online_map);
> > + fixup_irqs();
> > local_flush_tlb_all();
> > cpu_clear(cpu, cpu_callin_map);
> > return 0;
> > --
> > 1.6.0.1.161.g7f314
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/