Re: [External] Re: [PATCH] x86: fix get_wchan() not support the ORC unwinder

From: 郑琦
Date: Wed Jun 16 2021 - 03:34:37 EST


On Tue, Jun 15, 2021 at 12:21 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On 6/11/21 5:46 AM, Qi Zheng wrote:
> > Currently, the kernel CONFIG_UNWINDER_ORC option is enabled by
> > default on x86, but the implementation of get_wchan() is still
> > based on the frame pointer unwinder, so the /proc/<pid>/wchan
> > always return 0 regardless of whether the task <pid> is running.
> >
> > We reimplement the get_wchan() by calling stack_trace_save_tsk(),
> > which is adapted to the ORC and frame pointer unwinders.
>
> How much slower does this make ps?

I used the bpftrace tool to test the running time of get_wchan() in the two
cases of the ORC and frame pointer unwinders, the test script and
the result are as follows:

the test script:
bpftrace -e 'kprobe:get_wchan { @start[tid] = nsecs; } kretprobe: get_wchan
/@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }'

the result:
1) ORC unwinder ( before applying this patch )

@ns[ps]:
[512, 1K) 4609 |@@@@@@@@@@@@ |
[1K, 2K) 18599 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 1848 |@@@@@ |
[4K, 8K) 307 | |
[8K, 16K) 74 | |
[16K, 32K) 12 | |

73% of the cases are in the [1K, 2K) range.
Notice: In this case, the get_wchan() always returns the wrong value of 0.

2) ORC unwinder ( after applying this patch )

@ns[ps]:
[512, 1K) 536 |@ |
[1K, 2K) 19945 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 5604 |@@@@@@@@@@@@@@ |
[4K, 8K) 246 | |
[8K, 16K) 154 | |
[16K, 32K) 18 | |

75% of the cases are in the [1K, 2K) range.

3) frame point unwinder ( before applying this patch )

@ns[ps]:
[512, 1K) 245 | |
[1K, 2K) 16577 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 2788 |@@@@@@@@ |
[4K, 8K) 190 | |
[8K, 16K) 74 | |
[16K, 32K) 9 | |

83% of the cases are in the [1K, 2K) range.

4) frame point unwinder ( after applying this patch )

@ns[ps]:
[512, 1K) 85 | |
[1K, 2K) 12023 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K) 7418 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4K, 8K) 232 |@ |
[8K, 16K) 104 | |
[16K, 32K) 18 | |

60% of the cases are in the [1K, 2K) range.

In summary, the running time of get_wchan() has increased after applying this
patch. But the get_wchan() is not the hotspot function, and this is a bug in the
default ORC option, so I think these increased runtimes are acceptable.

In addition, this issue has existed for nearly 4 years and no one has
fixed it, if
nobody cares about the return value of the get_wchan(), maybe we can return
0 or remove this function directly. What do you think?

Best regards,
Qi Zheng

>
> --Andy