Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug

From: Tvrtko Ursulin
Date: Mon Sep 06 2021 - 06:20:39 EST

Next message: Hans Verkuil: "Re: [PATCH 4/8] staging: media: zoran: add debugfs"
Previous message: Paolo Bonzini: "Re: [PATCH 0/3] KVM: x86/mmu: kvm_mmu_page cleanups"
In reply to: jim . cromie: "Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug"
Next in thread: jim . cromie: "Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/09/2021 22:57, jim.cromie@xxxxxxxxx wrote:

On Fri, Sep 3, 2021 at 5:15 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:

On 31/08/2021 21:21, Jim Cromie wrote:

drm's debug system writes 10 distinct categories of messages to syslog
using a small API[1]: drm_dbg*(10 names), DRM_DEV_DEBUG*(3 names),
DRM_DEBUG*(8 names). There are thousands of these callsites, each
categorized in this systematized way.

These callsites can be enabled at runtime by their category, each
controlled by a bit in drm.debug (/sys/modules/drm/parameter/debug).
In the current "basic" implementation, drm_debug_enabled() tests these
bits in __drm_debug each time an API[1] call is executed; while cheap
individually, the costs accumulate with uptime.

This patch uses dynamic-debug with jump-label to patch enabled calls
onto their respective NOOP slots, avoiding all runtime bit-checks of
__drm_debug by drm_debug_enabled().

Dynamic debug has no concept of category, but we can emulate one by
replacing enum categories with a set of prefix-strings; "drm:core:",
"drm:kms:" "drm:driver:" etc, and prepend them (at compile time) to
the given formats.

Then we can use:
`echo module drm format "^drm:core: " +p > control`

to enable the whole category with one query.

Probably stupid question - enabling stuff at boot time still works as
described in Documentation/admin-guide/dynamic-debug-howto.rst?

yes. its turned on in earlyinit, and cmdline args are a processed then,
and when modules are added

Second question, which perhaps has been covered in the past so apologies
if redundant - what is the advantage of allowing this to be
configurable, versus perhaps always enabling it? Like what would be the
reasons someone wouldn't just want to have CONFIG_DYNAMIC_DEBUG compiled
in? Kernel binary size?

Im unaware of anything on this topic, but I can opine :-)
Its been configurable since I saw it and thought "jump-labels are cool!"

code is small
[jimc@frodo local-i915m]$ size lib/dynamic_debug.o
text data bss dec hex filename
24016 8041 64 32121 7d79 lib/dynamic_debug.o

Its data tables are big, particularly the __dyndbg section
builtins:
dyndbg: 108 debug prints in module mptcp
dyndbg: 2 debug prints in module i386
dyndbg: 2 debug prints in module xen
dyndbg: 2 debug prints in module fixup
dyndbg: 7 debug prints in module irq
dyndbg: 3039 prdebugs in 283 modules, 11 KiB in ddebug tables, 166 kiB
in __dyndbg section

bash-5.1#
bash-5.1# for m in i915 amdgpu ; do modprobe $m dyndbg=+_ ; done
dyndbg: 384 debug prints in module drm
dyndbg: 211 debug prints in module drm_kms_helper
dyndbg: 2 debug prints in module ttm
dyndbg: 8 debug prints in module video
dyndbg: 1727 debug prints in module i915
dyndbg: processed 1 queries, with 3852 matches, 0 errs
dyndbg: 3852 debug prints in module amdgpu
[drm] amdgpu kernel modesetting enabled.
amdgpu: CRAT table disabled by module option
amdgpu: Virtual CRAT table created for CPU
amdgpu: Topology: Add CPU node
bash-5.1#

At 56 bytes / callsite, it adds up.
And teaching DRM to use it enlarges its use dramatically,
not just in drm itself, but in many drivers

amdgpu has 3852 callsite, (vs 3039 in my kernel), so it has ~240kb.
It has extra (large chunks generated by macros) to trim,
but i915 has ~1700, and drm has ~380

I have WIP to reduce the table space, by splitting it into 2 separate ones;
guts and decorations (module, function, file pointers).
The decoration recs are redundant, 45% are copies of previous
(function changes fastest)
It needs much rework, but it should get 20% overall.
decorations are 24/56 of footprint.

I'll try to extract the "executive summary" from this, you tell me if I got it right.

So using or not using dynamic debug for DRM debug ends up being about shifting the cost between kernel binary size (data section grows by each pr_debug call site) and runtime conditionals?

Since the table sizes you mention seem significant enough, I think that justifies existence of DRM_USE_DYNAMIC_DEBUG. It would probably be a good idea to put some commentary on that there. Ideally including some rough estimates both including space cost per call site and space cost for a typical distro kernel build?

Regards,

Tvrtko

Next message: Hans Verkuil: "Re: [PATCH 4/8] staging: media: zoran: add debugfs"
Previous message: Paolo Bonzini: "Re: [PATCH 0/3] KVM: x86/mmu: kvm_mmu_page cleanups"
In reply to: jim . cromie: "Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug"
Next in thread: jim . cromie: "Re: [Intel-gfx] [PATCH v7 5/8] drm_print: add choice to use dynamic debug in drm-debug"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]