Re: [External] Re: [PATCH] fs/proc/kcore.c: add mmap interface

From: zhoufeng
Date: Thu May 27 2021 - 02:37:50 EST




在 2021/5/27 上午8:39, Andrew Morton 写道:
On Wed, 26 May 2021 15:51:42 +0800 Feng zhou <zhoufeng.zf@xxxxxxxxxxxxx> wrote:

From: ZHOUFENG <zhoufeng.zf@xxxxxxxxxxxxx>

When we do the kernel monitor, use the DRGN
(https://github.com/osandov/drgn) access to kernel data structures,
found that the system calls a lot. DRGN is implemented by reading
/proc/kcore. After looking at the kcore code, it is found that kcore
does not implement mmap, resulting in frequent context switching
triggered by read. Therefore, we want to add mmap interface to optimize
performance. Since vmalloc and module areas will change with allocation
and release, consistency cannot be guaranteed, so mmap interface only
maps KCORE_TEXT and KCORE_RAM.

The test results:
1. the default version of kcore
real 11.00
user 8.53
sys 3.59

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.64 128.578319 12 11168701 pread64
...
------ ----------- ----------- --------- --------- ----------------
100.00 129.042853 11193748 966 total

2. added kcore for the mmap interface
real 6.44
user 7.32
sys 0.24

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.94 0.130120 24 5317 315 futex
11.66 0.046077 21 2231 1 lstat
9.23 0.036449 177 206 mmap
...
------ ----------- ----------- --------- --------- ----------------
100.00 0.395077 25435 971 total

The test results show that the number of system calls and time
consumption are significantly reduced.


hm, OK, I guess why not. The performance improvements for DRGN (which
appears to be useful) are nice and the code is simple.

I'm surprised that it makes this much difference. Has DRGN been fully
optimised to minimise the amount of pread()ing which it does? Why does
it do so much reading?

DRGN is a tool similar to Crash, but much lighter. It allows users to obtain kernel data structures from Python scripts. Based on this, we intend to use DRGN for kernel monitoring. So we used some pressure test scripts to test the loss of monitoring.
Monitoring is all about getting current real-time data, so every time DRGN tries to get kernel data, it needs to read /proc/kcore. In my script, I tried to loop 1000 times to obtain the information of all the processes in the machine, in order to construct a scene where kernel data is frequently read. So, the frequency in the default version of kcore, pread is very high. In view of this situation, our optimization idea is to reduce the number of context switches as much as possible under the scenario of frequent kernel data acquisition, to reduce the performance loss to a minimum, and then move the monitoring system to the production environment. After running for a long time in a production environment, the number of kernel data reads was added as time went on, and the pread number also increased. If users use mmap, it's once for all.

Attached is the test script:
#!/usr/bin/env drgn
# Copyright (c) Facebook, Inc. and its affiliates.
# SPDX-License-Identifier: GPL-3.0-or-later

"""A simplified implementation of ps(1) using drgn"""

from drgn.helpers.linux.pid import for_each_task

count = 0
while (count < 1000):
count = count + 1
#print("PID COMM")
for task in for_each_task(prog):
pid = task.pid.value_()
comm = task.comm.string_().decode()
#print(f"{pid:<10} {comm}")


Thanks, I shall await input from others before moving ahead with this.