Re: [LSF/MM/BPF TOPIC] tracing the source of errors

From: Darrick J. Wong
Date: Wed Feb 07 2024 - 12:16:12 EST


On Wed, Feb 07, 2024 at 10:54:34AM +0100, Miklos Szeredi wrote:
> [I'm not planning to attend LSF this year, but I thought this topic
> might be of interest to those who will.]
>
> The errno thing is really ancient and yet quite usable. But when
> trying to find out where a particular EINVAL is coming from, that's
> often mission impossible.
>
> Would it make sense to add infrastructure to allow tracing the source
> of errors? E.g.
>
> strace --errno-trace ls -l foo
> ...
> statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
> ...
>
> Don't know about others, but this issue comes up quite often for me.
>
> I would implement this with macros that record the place where a
> particular error has originated, and some way to query the last one
> (which wouldn't be 100% accurate, but good enough I guess).

Hmmm, weren't Kent and Suren working on code tagging for memory
allocation profiling? It would be kinda nice to wrap that up in the
error return paths as well.

Granted then we end up with some nasty macro mess like:

[Pretend that there's a struct errno_tag, DEFINE_ALLOC_TAG, and
__alloc_tag_add symbols that looks mostly like struct alloc_tag from [1]
and then (backslashes elided)]

#define Err(x)
({
int __errno = (x);
DEFINE_ERRNO_TAG(_errno_tag);

trace_return_errno(__this_address, __errno)
__errno_tag_add(&_errno_tag, __errno);
__errno;
})

foo = kmalloc(...);
if (!foo)
return Err(-ENOMEM);

or

if (fs_is_messed_up())
return Err(-EINVAL);

This would get us the ability to ftrace for where errno returns
initiate, as well as collect counters for how often we're actually
doing that in production. You could even add time_stats too, but
annotating the entire kernel might be a stretch.

--D

[1] https://lwn.net/Articles/906660/

> Thanks,
> Miklos
>