Re: [PATCH v2 0/4] Enlarge tracepoints in the display component

From: Kazlauskas, Nicholas
Date: Wed Sep 16 2020 - 16:48:01 EST


On 2020-09-16 5:12 a.m., Daniel Vetter wrote:
On Fri, Sep 11, 2020 at 10:59:23AM -0400, Rodrigo Siqueira wrote:
Debug issues related to display can be a challenge due to the complexity
around this topic and different source of information might help in this
process. We already have support for tracepoints inside the display
component, i.e., we have the basic functionalities available and we just
need to expand it in order to make it more valuable for debugging. For
this reason, this patchset reworks part of the current tracepoint
options and add different sets of tracing inside amdgpu_dm, display
core, and DCN10. The first patch of this series just rework part of the
current tracepoints and the last set of patches introduces new
tracepoints.

This first patchset version is functional. Please, let me know what I
can improve in the current version but also let me know what kind of
tracepoint I can add for the next version.

Finally, I want to highlight that this work is based on a set of patches
originally made by Nicholas Kazlauskas.

Change in V2:
- I added another patch for capturing the clock state for different display
architecture.

Hm I'm not super sure tracepoints for state dumping are the right thing
here. We kinda have the atomic state dumping code with all the various
callbacks, and you can extend that pretty easily. Gives you full state
dump in debugfs, plus a few function to dump into dmesg.

Maybe what we need is a function to dump this also into printk tracepoint
(otoh with Sean Paul's tracepoint work we'd get that through the dmesg
stuff already), and then you could do it there?

Upside is that for customers they'd get a much more consistent way to
debug display issues across different drivers.

For low-level hw debug what we do is give the hw guys an mmio trace, and
they replay it on the fancy boxes :-) So for that I think this here is
again too high level, but maybe what you have is a bit different.
-Daniel

We have raw register traces, but what I find most useful is to be able to see are the incoming DRM IOCTLs, objects and properties per commit.

Many of the bugs we see in display code is in the conversion from DRM -> DM -> DC state. The current HW state is kind of useless in most cases, but the sequence helps track down intermittent problems and understand state transitions.

Tracepoints provide everything I really need to be able to track down these problems without falling back to a full debugger. The existing DRM prints (even at high logging levels) aren't enough to understand what's going on in most cases in our driver so funneling those into tracepoints to improve perf doesn't really help that much.

I think this kind of idea was rejected for DRM core last year with Sean's patch series but if we can't get them into core then I'd like to get them into our driver at least. These are a cleaned up version of Sean's work + my work that I end up applying locally whenever I debug something.

Regards,
Nicholas Kazlauskas



Rodrigo Siqueira (4):
drm/amd/display: Rework registers tracepoint
drm/amd/display: Add tracepoint for amdgpu_dm
drm/amd/display: Add pipe_state tracepoint
drm/amd/display: Add tracepoint for capturing clocks state

.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 17 +
.../amd/display/amdgpu_dm/amdgpu_dm_trace.h | 712 +++++++++++++++++-
.../dc/clk_mgr/dce112/dce112_clk_mgr.c | 5 +
.../display/dc/clk_mgr/dcn10/rv1_clk_mgr.c | 4 +
.../display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c | 4 +
.../amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 4 +
.../display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c | 4 +
drivers/gpu/drm/amd/display/dc/core/dc.c | 11 +
.../gpu/drm/amd/display/dc/dce/dce_clk_mgr.c | 5 +
.../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 17 +-
10 files changed, 747 insertions(+), 36 deletions(-)

--
2.28.0