Re: [re: PATCH v2 00/15 - 00/11] dyndbg: add support for writing debug logs to trace

From: Łukasz Bartosik
Date: Mon Dec 18 2023 - 09:13:43 EST


pon., 18 gru 2023 o 02:01 Łukasz Bartosik <lb@xxxxxxxxxxxx> napisał(a):
>
> sob., 16 gru 2023 o 04:09 <jim.cromie@xxxxxxxxx> napisał(a):
> >
> > On Thu, Dec 14, 2023 at 8:20 AM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
> > >
> > > sob., 9 gru 2023 o 01:31 Łukasz Bartosik <lb@xxxxxxxxxxxx> napisał(a):
> > > >
> > > > pt., 8 gru 2023 o 01:15 Jim Cromie <jim.cromie@xxxxxxxxx> napisał(a):
> > > > >
> > > > > hi Lukas,
> > > > >
> > > > > Sorry for the delay, I probably should have split this up.
> > > > > (also cc'g gregkh)
> > > > >
> > > > > Ive been banging on your v2 patchset:
> > > > > https://lore.kernel.org/lkml/20231130234048.157509-1-lb@xxxxxxxxxxxx/
> > > > >
> > > >
> > > > Jim, thank you for your thorough testing and review.
> > > >
> > > > > Things are looking pretty good, a few issues follow. And some patches.
> > > > >
> > > >
> > > > ;)
> > > >
> > > > > trivial:
> > > > >
> > > > > dyndbg: export _print_hex_dump # squash wo comment
> > > > > dyndbg: tweak pr_info format s/trace dest/trace_dest/ # greppability squash
> > > > > dyndbg: disambiguate quoting in a debug msg
> > > > > dyndbg: fix old BUG_ON in >control parser
> > > > >
> > > > > Then:
> > > > >
> > > > > dyndbg: change +T:name_terminator to dot
> > > > > dyndbg: treat comma as a token separator
> > > > >
> > > > > 1st allows 2nd, 2nd allows simpler arg-passing, boot-args, etc:
> > > > >
> > > > > echo module,test_dynamic_debug,class,L2,+p > /proc/dynamic_debug/control
> > > > > modprobe test_dynamic_debug dyndbg=class,L2,+mfl
> > > > >
> > > > > Given theres no legacy wrt comma, swapping it now with dot seems
> > > > > better semantically than "dot as token/list separator".
> > > > >
> > > > > Aside: /proc/dynamic_debug/control is always there (if configd), even
> > > > > when <debugfs> isn't mounted. Its more universal, and copy-pastable.
> > > > >
> > > > > dyndbg: __skip_spaces - and comma
> > > > > dyndbg: split multi-query strings with %
> > > > >
> > > > > % allows escape-free multi-cmd dyndbg args:
> > > > >
> > > > > modprobe test_dynamic_debug \
> > > > > dyndbg=open,selftest%class,D2_CORE,+T:selftest.mf
> > > > >
> > > > > dyndbg: reduce verbose/debug clutter
> > > > > dyndbg: move lock,unlock into ddebug_change, drop goto - revisit
> > > > >
> > > > > Ive just pushed it, I will bump my version here.
> > > > > To github.com:jimc/linux.git
> > > > > + 20d113eb6f9a...66fa2e4cb989 lukas-v2.1 -> lukas-v2.1 (forced update)
> > > > >
> > >
> > > hi Jim,
> > >
> > > I will squash the following commits to their appropriate peers:
> > > - dyndbg: export _print_hex_dump
> > > - dyndbg: tweak pr_info format s/trace dest/trace_dest/
> > > - dyndbg: change +T:name_terminator to dot
> > >
> > > I will also drop completely "dyndbg: move lock,unlock into
> > > ddebug_change, drop goto" and leave the remaining commits as is.
> >
> > cool. and yes, late ack.
> > I got distracted by move of flags-handler to the bottom of the fn.
> > Pls add note about moving the lock to protect open/close too.
> >
> > >
> > > > > SELFTEST
> > > > >
> > > > > Ive taken the liberty to write an ad-hoc test script (inline below),
> > > > > to exersize the parser with legacy command input, and with the new
> > > > > stuff mentioned above: comma-separated-tokens, %-separated-multi-cmds,
> > > > > while counting changes to builtin,etc modules, to validate against
> > > > > expectations.
> > > > >
> > > > > The change-count tests achieve some closed-loop testing, but checking
> > > > > syslog for messages written always seemed too hard/unreliable. Your
> > > > > private trace-instances work promises a solution, by giving an
> > > > > observable closed system to build upon.
> > > > >
> > > > > I made some attempts to write messages to the trace-buf, but ran out
> > > > > of clues*tuits. And I encountered a couple more definite problems:
> > > > >
> > > >
> > > > Let me dig through test scripts you created and issues you run into.
> > > >
> > > >
> > > > > 1- modprobe/rmmod lifecycle probs
> > > > >
> > > > > Ive coded some blue-sky and not-so-proper "modprobe,+T:it,-T:it,rmmod"
> > > > > life-cycle scenarios, which can wedge a previously created instance.
> > > > > Once wedged, I cannot recover. See the test_private_trace{,_2,_3}
> > > > > functions, and the error_log_ref() following each.
> > > > >
> > >
> > > The tests are very useful. I root caused the failures. Please see below.
> > >
> > > > > This brittleness begs a question; should we have auto-open (mapping
> > > > > new names to available 1-63 trc-ids) ? And non-close ? If it just did
> > > > > the right thing, particularly on rmmod, it would prevent "misuse".
> > > > >
> > >
> > > I would rather not have auto-open. For me personal argument against it is
> > > that I usually make a lot of typos and with auto-open if I don't
> > > verify what I wrote
> > > I won't know whether logs will be written to the "right" instance or a
> > > newly created instance
> > > with typo name. In case of open/close commands error pops up when I
> > > try to write logs
> > > to a trace instance which was not opened.
> >
> > yes. that is a value. auto-open was 1/2 baked speculation
> >
> > >
> > > > > I don't think auto-open obsoletes the open (& esp) close commands, but
> > > > > Id like to see scripted test scenarios using close in combo with the
> > > > > right /sys/kernel/tracing/* manipulations, to prove its not all just a
> > > > > fever dream.
> > > > >
> > >
> > > Would you please elaborate what you mean by close in combo with right
> > > /sys/kernel/tracing/* manipulations ?
> > >
> >
> > you detected the furious flapping, didnt you :-)
> >
> > lemme start with the global-events trace-buf, for analogy/comparison.
> > to use it one must do both:
> > echo 1 > /sys/kernel/tracing/events/dyndbg/enable
> > echo $some_selection +T > /proc/dynamic_debug/control
> >
>
> Currently if default destination is not set then +T will fail because
> it does not know
> what destination to use. Would you like in this case to use
> prdbg&dyndbg events as
> a default destination instead of failing ?
>
> > using +T:private_buffer no longer requires the enable (im guessing)
>
> It does not require to be enabled
>
> > but are there any other/corresponding setup actions / manipulations ?
> >
>
> We need to call trace_array_get_by_name, it will either create new
> trace instance
> or reuse existing one if it already exists. In both cases it also
> increases reference count of the requested trace instance.
> This is done as a part of open command.
>
> > if nothing else is needed, when does the instance open ?
>
> The open command calls trace_array_get_by_name
>
> > after test-dynamic-debug-doprints on +T:private callsites ?
> > is it detectable from a script ?
> > probably: if [ -e /sys/kernel/tracing/instances/foo ]
> >
> > if the instance is already open, I presume it is not cleared ?
> >
>
> Contents of instance will be preserved
>
> > any other play-nice-with-other-users stuff ?
> >
>
> I will take this into account when creating new tests.
>
> > conversely, to write a closed-loop test-script,
> > should the script just delete the instance/$name
> > after verifying its contents from the last doprints ?
> >
>
> Yes, it makes sense for the test script to delete trace instance at
> the end of a test scenario.
> It will simulate the user action of deleting the instance.
>
> >
> >
> > > > > Your expertise in opening, writing to, manipulating & destroying
> > > > > private (and the global) tracebufs, distilled into new shell funcs,
> > > > > would be enormously helpful towards demonstrating the private-buffer
> > > > > use case and its value.
> > > > >
> > >
> > > I will work on adding new tests for trace.
> >
> > nice!
> >
> > >
> > > > > 2- some new flags parse error:
> > > > >
> > > > > [ 1273.074535] dyndbg: 32 debug prints in module test_dynamic_debug
> > > > > [ 1273.075365] dyndbg: module: test_dynamic_debug dyndbg="class,D2_CORE,+T:foo%class,D2_KMS,+T:foo"
> > > > > [ 1273.076307] dyndbg: query 0: <class,D2_CORE,+T:foo> on module: <test_dynamic_debug>
> > > > > [ 1273.077068] dyndbg: split into words: "class" "D2_CORE" "+T:foo"
> > > > > [ 1273.077673] dyndbg: unknown flag 'c'
> > > > > [ 1273.078033] dyndbg: flags parse failed on 2: +T:foo
> > > > > [ 1273.078519] dyndbg: query 1: <class,D2_KMS,+T:foo> on module: <test_dynamic_debug>
> > > > > [ 1273.079266] dyndbg: split into words: "class" "D2_KMS" "+T:foo"
> > > > > [ 1273.079872] dyndbg: unknown flag '�'
> > > > > [ 1273.080228] dyndbg: flags parse failed on 2: +T:foo
> > > > > [ 1273.080716] dyndbg: processed 2 queries, with 0 matches, 2 errs
> > > > > : 0 matches on =Tl
> > > > >
> > > > > I have a suspicion this relates to moving the parse_flags call in
> > > > > ddebug_query, but I havent dug in.
> > > > >
> > > > >
> > > > > I also have some reservations about the default dest; 1st that it is a
> > > > > global state var, as noted at bottom of control:
> > > > >
> > > > > [root@v6 lx]# ddgrep \\btrace\\b # a better/narrower search-term ?
> > > > > ...
> > > > > Default trace destination: foo # add a '#:' prefix to these lines ?
> > >
> > > Do you mean
> >
> > This. it hides the info behind a # so any parser expecting just
> > callsites is safe.
> > and the following colon marks it as info for the clever ones.
> >
> > > #: Default trace destination:
> > > #: Opened trace instances: foo bar buz
> >
>
> Ack
>
> > But I would lalso ike the labels optimized for a tight grep, that wont
> > hit anything else,
> > and mostly thats easy to use. "trace_dest" almost does it, is a
> > pretty good mark on the bikeshed.
> >
> > > > > Opened trace instances: # all values should be on this line
> > > > >
> > >
> > > I will put all opened trace instance names on the same line
> > >
> > > > > Then theres the "preset" value, ie each site's dest_id (sorry I forgot
> > > > > your fieldname). I presume the default would override such a "preset"
> > > > > (otherwise it would have no effect).
> > > > >
> > > > > Is the default set on last open ? or on next use of +T:<foo> ?
> > > > >
> > >
> > > Default trace destination is set on [+-]T:<foo>
> >
> > unless foo hasnt been opened, Im guessing
>
> yes foo has to be opened otherwise the command will fail
>
> > does it matter if the last destination used resulted in a match ?
> >
>
> yes it does, if there was no match then such destination will not be
> used for setting default destination
>
> > >
> > > > > In the no-default world, a user/tester would decide how many
> > > > > trace-instances are needed, and map sets of callsites to them.
> > > > >
> > > > > # configure drm-debug streams for this test scenario
> > > > > cat<<EOF >/proc/dynamic_debug/control
> > > > > open drm_core
> > > > > open drm_mix
> > > > > open driver_1 # we can separate by modname but not drvr-number
> > > > > open driver_2
> > > > > class DRM_UT_CORE -T:drm_core # set dest_id, disabled state
> > >
> > > This sets default trace destination to drm_core
> > >
> > > > > class DRM_UT_CORE +mf # traces dont do prefixing (should it?)
> > >
> > > In this case logs will be neither written to syslog nor trace because
> > > both pT flags are not set.
> >
> > I think my example fails to illuminate the corner I seek.
> >
>
> What corner case did you try to emphasize here ?
>
> > > I verified that both output to trace instance and trace events honors
> > > mf flags (prefixing) when they are set for a given callsite.
> >
> > Ok. that is good to know. that sounds reducible to a test-fn.
> >
> > >
> > > > > # mix KMS & ATOMIC into 1 stream
> > > > > class DRM_UT_KMS -T:drm_mix
> > > > > class DRM_UT_ATOMIC -T:drm_mix
> > > > > EOF
> > > > >
> > > > > Then perhaps to turn on/off when needed: (legacy analogue version)
> > > > >
> > > > > echo 0x15 > /sys/module/drm/parameters/debug_trace
> > > > >
> > > > > With those trace-dest presets honored, the configured drm_core &
> > > > > drm_mix streams persist. If a later enablement applies the
> > > > > then-current default trace-dest, it would spoil this scheme.
> > > > >
> >
> > so each callsite has a dest, which is set individually when the
> > callsite is selected
> > and -T:name'd in the EOF block.
> >
> > I would *not* want enablements of those callsites to alter the dest
> > just because
> > the default-dest were altered in the interim.
> >
> > I think it would break the way drm.debug is supported, by only
> > touching the T bit.
> >
>
> If I understood you correctly -T: <trace_name> should preset only
> destination for a given callsite(s) and
> not a global default destination ?
>
> >
> > > > > Could you elaborate a scenario or 2 where the default behavior does
> > > > > something specific, and that its the right thing ? I dont understand
> > > > > it yet.
> > > > >
> > >
> > > I'm a bit lost here. Help me out please. What do you mean by default behavior ?
> >
> > default behavior is what happens when +T (wo a dest) is done,
> > after a default dest has been set. w/o a default-dest, it would be just :0
> >
> > So having a default dest means implying the :default_dest_name.
> > whatever it is set to,
> > on whatever callsites are selected, unless the ddcmd gives a name, ie
> > +T:this_buf
> >
> > As of last writing, I dont see how the extra implication helps.
> > It feels like hidden state.
> >
>
> Yes it can be misleading.
>
> > I can squint at it and see a local / loop var, fwiw.
> >
>
> Maybe let's get rid of default global destination and when +T will be
> provided without a name
> then it will always default to trace events (:0). Apart from that we
> would have presetting destination with -T:<trace_name> but only for a
> given callsite(s)
>

My statements contradict each other. My proposal is to get rid of
global default destination and when +T will be provided without a name
then preset trace destination will be used. If there is no preset
destination for a given callsite then destination will default to :0
(trace events prdbg&devdbg).
Presetting would be done with -T:<trace_name>

> Thanks,
> Lukasz
>
>
> > > > > OTOH
> > > > >
> > > > > One limitation of the above: the drm.debug_trace (posited above) would
> > > > > turn on all Ts in all those class'd callsites across the whole
> > > > > subsystem, irrespective of their preconfigured destination. That was
> > > > > always inherent in drm.debug, but its less than perfect.
> > > > >
> > > > > It sort-of defeats the point of doing +T only on the useful callsites.
> > > > >
> > > > > And global event buf is also enabled, it might be flood-prone.
> > > > >
> > > > > echo 1 > /sys/kernel/tracing/events/dyndbg/enable
> > > > >
> > > > > It would help if we could filter on trace-instance names:
> > > > > (this sounds familiar :-)
> > > > >
> > > > > ddcmd module '*' trace_dest drm_mix +T
> > > > >
> >
> > I think we need to implement this new keyword selector,
> > I think it addresses the flooding possible above,
> > when using both private traces and the global-trace.
> >
> > but we dont need it yet.
> >
> > > > > In reality, the dest-id is not even dependent on tracing per-se, it is
> > > > > a user classification system (in contrast to class <subsys-names>).
> > > > > It just happens to be tied by +T:name syntax to tracefs.
> > > > >
> > > > > No promise about +p:_alt_log_.mflt having meaning, or working.
> >
> > or not complaining, pr_notice ?
> >
> >
> > > > >
> > > > >
> >
> > I look forward to v3
> > Jim