Re: Tracing: rb_head_page_deactivate() caught in an infinite loop
From: Steven Rostedt
Date: Wed Jul 01 2020 - 22:03:20 EST
On Wed, 01 Jul 2020 10:07:06 -0700
rananta@xxxxxxxxxxxxxx wrote:
> Hi Steven and Mingo,
>
Hi Raghavendra,
> While trying to adjust the buffer size (echo <size> >
> /sys/kernel/debug/tracing/buffer_size_kb), we see that the kernel gets
> caught up in an infinite loop
> while traversing the "cpu_buffer->pages" list in
> rb_head_page_deactivate().
>
> Looks like the last node of the list could be uninitialized, thus
> leading to infinite traversal. From the data that we captured:
> 000|rb_head_page_deactivate(inline)
> | cpu_buffer = 0xFFFFFF8000671600 =
> kernel_size_le_lo32+0xFFFFFF652F6EE600 -> (
> ...
> | pages = 0xFFFFFF80A909D980 =
> kernel_size_le_lo32+0xFFFFFF65D811A980 -> (
> | next = 0xFFFFFF80A909D200 =
> kernel_size_le_lo32+0xFFFFFF65D811A200 -> (
> | next = 0xFFFFFF80A909D580 =
> kernel_size_le_lo32+0xFFFFFF65D811A580 -> (
> | next = 0xFFFFFF8138D1CD00 =
> kernel_size_le_lo32+0xFFFFFF6667D99D00 -> (
> | next = 0xFFFFFF80006716F0 =
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
> | next = 0xFFFFFF80006716F0 =
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
> | next = 0xFFFFFF80006716F0 =
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0 -> (
> | next = 0xFFFFFF80006716F0 =
> kernel_size_le_lo32+0xFFFFFF652F6EE6F0,
>
> Wanted to check with you if there's any scenario that could lead us into
> this state.
>
> Test details:
> -- Arch: arm64
> -- Kernel version 5.4.30; running on Andriod
> -- Test case: Running the following set of commands across reboot will
> lead us to the scenario
>
> atrace --async_start -z -c -b 120000 sched audio irq idle freq
> < Run any workload here >
> atrace --async_dump -z -c -b 1200000 sched audio irq idle freq >
> mytrace.trace
> atrace --async_stop > /dev/null
> echo 150000 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 200000 > /sys/kernel/debug/tracing/buffer_size_kb
> reboot
>
> Repeating the above lines across reboots would reproduce the issue.
> The "atrace" or "echo" would just get stuck while resizing the buffer
> size.
What do you mean repeat across reboots? If it doesn't happen it wont
ever happen, but if you reboot it may have it happen again?
> I'll try to reproduce the issue without atrace as well, but wondering
> what could be the reason for leading us to this state.
I haven't used arm lately, and I'm unfamiliar with atrace. So I don't
really know what is going on. If you can reproduce this with just a
shell script accessing the ftrace files, that would be much more useful.
Thanks,
-- Steve