Stop machine threads are getting preemted by the rt period enforcement

From: Max Krasnyansky
Date: Wed Jun 04 2008 - 14:07:47 EST


Peter, Ingo,

Take a look at the report below (came up during isolcpu= remove discussions).

It looks like stop_machine threads are getting forcefully preempted because
they exceed their RT quanta. It's strange because rt period is pretty long.
But given that disabling rt period logic solves the issue the machine was not
really stuck.

Max




Dimitri Sivanich wrote:
> On Tue, Jun 03, 2008 at 09:40:10AM -0500, Dimitri Sivanich wrote:
>> I tried the following scenario on an ia64 Altix running 2.6.26-rc4 with cpusets compiled in but cpuset fs unmounted. Do your patches already address this?
>>
>> $ taskset -cp 3 $$ (attach to cpu 3)
>> pid 4591's current affinity list: 0-3
>> pid 4591's new affinity list: 3
>> $ echo 0 > /sys/devices/system/cpu/cpu2/online (down cpu 2)
>> (above command hangs)
>>
>> Backtrace of pid 4591 (bash)
>>
>> Call Trace:
>> [<a00000010078e990>] schedule+0x1210/0x13c0
>> sp=e0000060b6dffc90 bsp=e0000060b6df11e0
>> [<a00000010078ef60>] schedule_timeout+0x40/0x180
>> sp=e0000060b6dffce0 bsp=e0000060b6df11b0
>> [<a00000010078d3e0>] wait_for_common+0x240/0x3c0
>> sp=e0000060b6dffd10 bsp=e0000060b6df1180
>> [<a00000010078d760>] wait_for_completion+0x40/0x60
>> sp=e0000060b6dffd40 bsp=e0000060b6df1160
>> [<a000000100114ee0>] __stop_machine_run+0x120/0x160
>> sp=e0000060b6dffd40 bsp=e0000060b6df1120
>> [<a000000100765ae0>] _cpu_down+0x2a0/0x600
>> sp=e0000060b6dffd80 bsp=e0000060b6df10c8
>> [<a000000100765ea0>] cpu_down+0x60/0xa0
>> sp=e0000060b6dffe20 bsp=e0000060b6df10a0
>> [<a000000100768090>] store_online+0x50/0xe0
>> sp=e0000060b6dffe20 bsp=e0000060b6df1070
>> [<a0000001004f8800>] sysdev_store+0x60/0xa0
>> sp=e0000060b6dffe20 bsp=e0000060b6df1038
>> [<a00000010022e370>] sysfs_write_file+0x250/0x300
>> sp=e0000060b6dffe20 bsp=e0000060b6df0fe0
>> [<a00000010018a750>] vfs_write+0x1b0/0x300
>> sp=e0000060b6dffe20 bsp=e0000060b6df0f90
>> [<a00000010018b350>] sys_write+0x70/0xe0
>> sp=e0000060b6dffe20 bsp=e0000060b6df0f18
>> [<a00000010000af80>] ia64_ret_from_syscall+0x0/0x20
>> sp=e0000060b6dffe30 bsp=e0000060b6df0f18
>> [<a000000000010720>] ia64_ivt+0xffffffff00010720/0x400
>> sp=e0000060b6e00000 bsp=e0000060b6df0f18
>
> The following workaround alleviates the symptom and hopefully is a hint as to the solution:
> echo -1 > /proc/sys/kernel/sched_rt_runtime_us
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/