Re: [PATCH v2 0/4] kasan, arm64, scs, stacktrace: collect stack traces from Shadow Call Stack
From: Andrey Konovalov
Date: Wed Apr 13 2022 - 15:28:16 EST
On Thu, Apr 7, 2022 at 8:42 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> I'm afraid from local testing (atop v5.18-rc1), with your config, I still can't
> get anywhere near your figures. I've tried to match toolchain versions with
> what was in your .config file, so I'm using clang 12.0.0 from the llvm.org
> binary releases, and binutils from the kernel.org crosstool 11.1.0 release.
>
> I took baselines with defconfig and defconfig + SHADOW_CALL_STACK, with console
> output completely suppressed with 'quiet loglevel=0':
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -s -k ~/kernel-images/andrey-unwind-v5.18-rc1-defconfig/Image
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1-defconfig/Image -append
> | loglevel=9 earlycon panic=-1 quiet loglevel=0' (50 runs):
> |
> | 0.512626031 seconds time elapsed ( +- 0.26% )
> |
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -s -k ~/kernel-images/andrey-unwind-v5.18-rc1-defconfig+scs/Image
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1-defconfig+scs/Image -append
> | loglevel=9 earlycon panic=-1 quiet loglevel=0' (50 runs):
> |
> | 0.523245952 seconds time elapsed ( +- 0.18% )
>
> Then I tried the same with your config, without your patches:
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -s -k ~/kernel-images/andrey-unwind-v5.18-rc1-andrey.config/Image
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1-andrey.config/Image -append
> | loglevel=9 earlycon panic=-1 quiet loglevel=0' (50 runs):
> |
> | 1.994692366 seconds time elapsed ( +- 0.05% )
>
> Then with your config, without your patches, with the stacktrace hacked out:
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -s -k ~/kernel-images/andrey-unwind-v5.18-rc1-andrey.config-nostacktrace/Image
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1-andrey.config-nostacktrace/Image
> | -append loglevel=9 earlycon panic=-1 quiet loglevel=0' (50 runs):
> |
> | 1.861823869 seconds time elapsed ( +- 0.05% )
>
> If I use those number to estimate the proportion of time spent stacktracing,
> with the baseline SCS number discounted to remove the hypervisor+VMM overheads,
> I get:
>
> (1.994692366 - 0.523245952) - (1.861823869 - 0.523245952)
> --------------------------------------------------------- = 0.09029788358
> (1.994692366 - 0.523245952)
>
> So roughly 9% when I try to maximize that figure. When actually poking hardware
> and doing real work, that figure goes down. For example, if just using "quiet":
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -q -k ~/kernel-images/andrey-unwind-v5.18-rc1-andrey.config/Image > /dev/null
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1-andrey.config/Image -append
> | loglevel=9 earlycon panic=-1 quiet' (50 runs):
> |
> | 4.653286475 seconds time elapsed ( +- 0.06% )
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -q -k ~/kernel-images/andrey-unwind-v5.18-rc1-andrey.config-nostacktrace/Image > /dev/null
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1-andrey.config-nostacktrace/Image
> | -append loglevel=9 earlycon panic=-1 quiet' (50 runs):
> |
> | 4.585750154 seconds time elapsed ( +- 0.05% )
>
> Which gives an estimate of:
>
> (4.653286475 - 0.523245952) - (4.585750154 - 0.523245952)
> --------------------------------------------------------- = 0.01635245964
> (4.653286475 - 0.523245952)
>
> ... or ~1.6% time spent backtracing:
>
> FWIW, applying your patches do show some benefit, but not as drastic as I was
> expecting:
>
> With console output suprressed:
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -s -k ~/kernel-images/andrey-unwind-v5.18-rc1+scs-unwind-andrey.config/Image
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1+scs-unwind-andrey.config/Image
> | -append loglevel=9 earlycon panic=-1 quiet loglevel=0' (50 runs):
> |
> | 1.920300410 seconds time elapsed ( +- 0.05% )
>
> ... down from ~9% to ~4%
>
> With console output merely reduced:
>
> | [mark@gravadlaks:~/repro]% taskset -c 64 ./vmboot.sh --perf-trials 50 -q -k ~/kernel-images/andrey-unwind-v5.18-rc1+scs-unwind-andrey.config/Image > /dev/null
> |
> | Performance counter stats for
> | '/home/mark/.opt/apps/qemu/bin/qemu-system-aarch64 -m 2048 -smp 1 -nographic
> | -no-reboot -machine virt,accel=kvm,gic-version=host -cpu host -kernel
> | /home/mark/kernel-images/andrey-unwind-v5.18-rc1+scs-unwind-andrey.config/Image
> | -append loglevel=9 earlycon panic=-1 quiet' (50 runs):
> |
> | 4.611277833 seconds time elapsed ( +- 0.04% )
>
> ... down from 1.6% to 0.6%
>
> Given the above I still think we need to understand this a bit better before we
> consider pursuing the SCS unwinder, given the issues I laid out in my prior mails.
>
> My hope is that we can improve the regular unwinder or other code such that
> this becomes moot. I'm aware of a few things we could try, but given it's very
> easy to sink a lot of time and effort into this, I'd like to first get some
> more details, as above.
Hi Mark,
I'm about to publish v3, where I'll include a detailed description of
how I measured the performance.
Perhaps we see different performance numbers because you're using
KVM-enabled VM on an Arm host and I'm using QEMU on x86-64 host.
Although, it's suspicious that the difference is so drastic. I'll try
to get my hands on some Arm hardware in the next few days and do the
measurements there.
This new version also will not be making any changes to the entry
code, as these changes add unwanted additional slowdown. That would be
great, if you could check the performance impact of v3 with your
setup.
Thanks!