Re: [PATCH v2 0/6] Improve visibility of writeback

From: Tejun Heo
Date: Wed Apr 03 2024 - 15:21:50 EST


Hello,

On Wed, Apr 03, 2024 at 03:06:56PM -0400, Kent Overstreet wrote:
..
> That's how it should be if you just make a point of making your internal
> state easy to view and introspect, but when I'm debugging issues that
> run into the wider block layer, or memory reclaim, we often hit a wall.

Try drgn:

https://drgn.readthedocs.io/en/latest/

I've been adding drgn scripts under tools/ directory for introspection.
They're easy to write, deploy and ask users to run.

> Writeback throttling was buggy for _months_, no visibility or
> introspection or concerns for debugging, and that's a small chunk of
> code. io_uring - had to disable it. I _still_ have people bringing
> issues to me that are clearly memory reclaim related but I don't have
> the tools.
>
> It's not like any of this code exports much in the way of useful
> tracepoints either, but tracepoints often just aren't what you want;
> what you want just to be able to see internal state (_without_ having to
> use a debugger, because that's completely impractical outside highly
> controlled environments) - and tracing is also never the first thing you
> want to reach for when you have a user asking you "hey, this thing went
> wonky, what's it doing?" - tracing automatically turns it into a multi
> step process of decide what you want to look at, run the workload more
> to collect data, iterate.
>
> Think more about "what would make code easier to debug" and less about
> "how do I shove this round peg through the square tracing/BPF slot".
> There's _way_ more we could be doing that would just make our lives
> easier.

Maybe it'd help classifying visibility into the the following categories:

1. Current state introspection.
2. Dynamic behavior tracing.
3. Accumluative behavior profiling.

drgn is great for #1. Tracing and BPF stuff is great for #2 especially when
things get complicated. #3 is the trickest. Static stuff is useful in a lot
of cases but BPF can also be useful in other cases.

I agree that it's all about using the right tool for the problem.

--
tejun