[PATCH] perf: Improve startup time by reducing unnecessary stat() calls

From: Krzysztof Łopatowski
Date: Thu Feb 06 2025 - 06:36:07 EST


When testing perf trace on NixOS, I noticed significant startup delays:
- `ls`: ~2ms
- `strace ls`: ~10ms
- `perf trace ls`: ~550ms

Profiling showed that 51% of the time is spent reading files,
26% in loading BPF programs, and 11% in `newfstatat`.

This patch optimizes module path exploration by avoiding `stat()` calls
unless necessary. For filesystems that do not implement `d_type`
(DT_UNKNOWN), it falls back to the old behavior.
See `readdir(3)` for details.

This reduces `perf trace ls` time to ~500ms.

A more thorough startup optimization based on command parameters would
be ideal, but that is a larger effort.

Signed-off-by: Krzysztof Łopatowski <krzysztof.m.lopatowski@xxxxxxxxx>
---
tools/perf/util/machine.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2d51badfbf2e..76b857fd752b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1354,7 +1354,7 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo

static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, int depth)
{
- struct dirent *dent;
+ const struct dirent *dent;
DIR *dir = opendir(dir_name);
int ret = 0;

@@ -1365,14 +1365,20 @@ static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, i

while ((dent = readdir(dir)) != NULL) {
char path[PATH_MAX];
- struct stat st;
+ unsigned char d_type = dent->d_type;

- /*sshfs might return bad dent->d_type, so we have to stat*/
path__join(path, sizeof(path), dir_name, dent->d_name);
- if (stat(path, &st))
- continue;

- if (S_ISDIR(st.st_mode)) {
+ if (d_type == DT_UNKNOWN) {
+ struct stat st;
+
+ if (stat(path, &st))
+ continue;
+ if (S_ISDIR(st.st_mode))
+ d_type = DT_DIR;
+ }
+
+ if (d_type == DT_DIR) {
if (!strcmp(dent->d_name, ".") ||
!strcmp(dent->d_name, ".."))
continue;
--
2.47.2