Re: [RFC PATCH 0/4] clockevents: fix clockevent_devices list corruptionafter cpu hotplug

From: Thomas Gleixner
Date: Thu Dec 10 2009 - 09:35:51 EST


On Thu, 10 Dec 2009, Xiaotian Feng wrote:
> I've met a list_del corruption, which was reported in
> http://lkml.org/lkml/2009/11/27/45. But no response, so I try to debug it
> by myself.
>
> After I added some printks to show all elements in clockevent_devices, I
> found kernel hangs when I tried to resume from s2ram.
>
> In clockevents_register_device, clockevents_do_notify ADD is always followed
> by clockevents_notify_released. Although clockevents_do_notify ADD will use
> tick_check_new_device to add new devices and replace old devices to the
> clockevents_released list, clockevents_notify_released add them back to
> clockevent_devices list.
>
> My system is Quad-Core x86_64, with apic and hpet enables, after boot up,
> the elements in clockevent_devices list is :
> clockevent_device->lapic(3)->hpet5(3)->lapic(2)->hpet4(2)->lapic(1)->hpet3(1)-
> ->lapic(0)->hpet2(0)->hpet(0)
> * () means cpu id
>
> But active clock_event_device is hpet2,hpet3,hpet4,hpet5. Then at s2ram stage,
> cpu 1,2,3 is down, then notify CLOCK_EVT_NOTIFY_CPU_DEAD will calls tick_shutdown,
> then hpet2,hpet3,hpet4,hpet5 was deleted from clockevent_device list.
> So after s2ram, elements in clockevent_device list is:
> clockevent_device->lapic(3)->lapic(2)->lapic(1)->lapic(0)->hpet2(0)->hpet(0)
>
> Then at resume stage, cpu 1,2,3 is up, it will register lapic again, and then
> perform list_add lapic on clockevent_device list, e.g. list_add lapic(1) on
> above list, lapic will move to the clockevent_device->next, but lapic(2)->next
> is still point to lapic(1), the list is circular and corrupted then.

Great detective work !

> This patchset aims to fixes above behaviour by:
> - on clockevents_register_device, if notify ADD success, move new devices
> to the clockevent_devices list, otherwise move to clockevents_released
> list.
> - on clockevents_notify_released, same behaviour as above.
> - on clockevents_notify CPU_DEAD, remove related devices on dead cpu from
> clockevents_released list.
>
> It makes sure that only active devices on each cpu is on clockevent_devices list.
> With this patchset, the list_del corruption disappeared, and suspend/resume, cpu
> hotplug works fine on my system.

I'm not happy about that churn. Why don't we simply scan the
clockevent_devices list for leftovers of the dead CPU ?

Untested patch below solves the same problem.

Thanks,

tglx
----
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 20a8920..5dd857f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -238,8 +238,9 @@ void clockevents_exchange_device(struct clock_event_device *old,
*/
void clockevents_notify(unsigned long reason, void *arg)
{
- struct list_head *node, *tmp;
+ struct clock_event_device *dev, *tmp;
unsigned long flags;
+ int cpu;

spin_lock_irqsave(&clockevents_lock, flags);
clockevents_do_notify(reason, arg);
@@ -250,8 +251,19 @@ void clockevents_notify(unsigned long reason, void *arg)
* Unregister the clock event devices which were
* released from the users in the notify chain.
*/
- list_for_each_safe(node, tmp, &clockevents_released)
- list_del(node);
+ list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
+ list_del(&dev->list);
+ /*
+ * Now check whether the CPU has left unused per cpu devices
+ */
+ cpu = *((int *)arg);
+ list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
+ if (cpumask_test_cpu(cpu, dev->cpumask) &&
+ cpumask_weight(dev->cpumask) == 1) {
+ BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
+ list_del(&dev->list);
+ }
+ }
break;
default:
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/