Re: [PATCH v1] dynamic_debug: add support for logs destination
From: Łukasz Bartosik
Date: Wed Oct 11 2023 - 09:59:49 EST
śr., 11 paź 2023 o 11:42 Daniel Vetter <daniel@xxxxxxxx> napisał(a):
>
> On Wed, Oct 11, 2023 at 11:48:16AM +0300, Pekka Paalanen wrote:
> > On Tue, 10 Oct 2023 10:06:02 -0600
> > jim.cromie@xxxxxxxxx wrote:
> >
> > > since I name-dropped you all,
> >
> > Hi everyone,
> >
> > I'm really happy to see this topic being developed! I've practically
> > forgot about it myself, but the need for it has not diminished at all.
> >
It's good to hear you guys are also interested in this feature.
> > I didn't understand much of the conversation, so I'll just reiterate
> > what I would use it for, as a Wayland compositor developer.
> >
> > I added a few more cc's to get better coverage of DRM and Wayland
> > compositor developers.
> >
> > > On Tue, Oct 10, 2023 at 10:01 AM <jim.cromie@xxxxxxxxx> wrote:
> > > >
> > > > On Mon, Oct 9, 2023 at 4:47 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
> >
> > ...
> >
> > > > > I don't have a real life use case to configure different trace
> > > > > instance for each callsite.
> > > > > I just tried to be as much flexible as possible.
> > > > >
> > > >
> > > > Ive come around to agree - I looked back at some old threads
> > > > (that I was a part of, and barely remembered :-}
> > > >
> > > > At least Sean Paul, Lyude, Simon Ser, Pekka Paalanen
> > > > have expressed a desire for a "flight-recorder"
> > > > it'd be hard to say now that 2-3 such buffers would always be enough,
> > > > esp as theres a performance reason for having your own.
> >
> > A Wayland compositor has roughly three important things where the kernel
> > debugs might come in handy:
> > - input
> > - DRM KMS
> > - DRM GPU rendering
> >
> > DRM KMS is the one I've been thinking of in the flight recorder context
> > the most, because KMS hardware varies a lot, and there is plenty of
> > room for both KMS drivers and KMS userspace to go wrong. The usual
> > result is your display doesn't work, so the system is practically
> > unusable to the end user. In the wild, the simplest or maybe the only
> > way out of that may be a reboot, maybe an automated one (e.g. digital
> > signage). In order to debug such problems, we would need both
> > compositor logs and the relevant kernel debug messages.
> >
> > For example, Weston already has a flight recorder framework of its own,
> > so we have the compositor debug logs. It would be useful to get the
> > selected kernel debug logs in the same place. It could be used for
> > automated or semi-manual bug reporting, for example, making the
> > administrator or end user life much easier reporting issues.
> >
> > Since this is usually a production environment, and the Wayland
> > compositor runs without root privileges, we need something that works
> > with that. We would likely want the kernel debug messages in the
> > compositor to combine and order them properly with the compositor debug
> > messages.
> >
> > It's quite likely that developers would like to pick and choose which
> > kernel debug messages might be interesting enough to record, to avoid
> > excessive log flooding. The flight recorder in Weston is fixed size to
> > avoid running out of memory or disk space. I can also see that Weston
> > could have debugging options that affect which kernel debug messages it
> > subscribes to. We can have a reasonable default setup that allows us to
> > pinpoint the problem area and figure out most problems, and if needed,
> > we could ask the administrator pass another debug option to Weston. It
> > helps if there is just one place to configure everything about the
> > compositor.
> >
> > This implies that it would be really nice to have userspace subscriber
> > specific debug message streams from the kernel, or a good way to filter
> > the messages we want. A Wayland compositor would not be interested in
> > file system or wireless debugs for example, but another system
> > component might be. There is also a security aspect of which component is
> > allowed to see which messages in case they could contain anything
> > sensitive (input debug could contain typed passwords).
> >
> > Configuring the kernel debug message selection for our debug message
> > stream can and probably should require elevated privileges, and we can
> > likely solve that in userspace with a daemon or such, to allow the
> > Wayland compositor to run as a regular user.
> >
> > Thinking of desktop systems, and especially physically multi-seat systems:
> > - there can be multiple different Wayland compositors running simultaneously
> > - each of them may want debug messages only from a specific DRM KMS
> > device instance, and not from others
> > - each of them may have a different idea of which debug messages are important
> > - because DRM KMS leasing is a thing, different compositor instances
> > could be using the same DRM KMS device instance simultaneously; since
> > this is specific to DRM KMS, and it should be harmless to get a
> > little too much DRM KMS debug (that is, from the whole device instead
> > of just the leased parts), it may not be worth to consider splitting
> > debug message streams this far.
> >
> > If userspace is offered some standardised fields in kernel debug
> > structures, then userspace could do some filtering on its own too, but I
> > guess it would be better to filter at the source and not need that.
> >
> > There is also an anti-goal. The kernel debug message contents are
> > specifically not machine-parsable. I very much do not want to impose
> > debug strings as ABI, they are for human (and AI?) readers only.
> >
> >
> > As a summary, here are the most important requirements first:
> > - usable in production as a normal thing to enable always by default
> > - final delivery to unprivileged userspace process
>
> I think this is the one that's trickiest, and I don't fully understand why
> you need it. The issues I'm seeing:
>
> - logs tend to leak a lot of kernel internal state that's useful for
> attacks. There's measures for the worst (like obfuscating kernel
> pointers by hashing them), so there's always going to be a difference
> here between what full sysadmin can get and what unpriviledged userspace
> can get. And there's always a risk we miss something that we should
> obfuscate but didn't.
>
> - handing this to userspace increases the risks it becomes uapi. Who's
> going to stop compositors from sussing out the reason an atomic commit
> failed from the logs if they can get them easily, and these logs contain
> very interesting information about why something failed?
>
> Much better if journald or a crash handler assemebles all the different
> flight recorder logs and packages it into a bug report so that the
> compositor cannot ever get at these directly. Yeah this needs some OS
> support with a dbus request or similar so that the compositor can ask
> for a crash dump with everything relevant to its session.
>
This is similar to what we plan to do in ChromeOS. We want to enable
writing debug logs of each subsystem/driver
of interest (e.g. thunderbolt) into a separate trace instance and when
an issue happens and a user submits
a feedback report we will attach the captured logs from trace
instances to the report.
> - the idea of an in-kernel flight recorder is that it's really fast. The
> entire tracing infra is built such that recording an event is really
> quick, but printing it is not - the entire string formatting is delayed
> to when userspace reads the buffers. If you constantly push the log
> messages to userspace we toss the advantage of the low-overhead
> in-kernel flight recorder. If you push logs to dmesg there's a
> substantial measureable overhead which you don't really want in
> production, and your requirement would impose the same.
>
> - I'm not sure how this is supposed to mesh with userspace log aggregators
> like journald when every compositor has it's own flight recorder on top.
> Feels a bit like a solution that ignores the entire os stack and assumes
> that the only pieces we can touch are the kernel and the compositor to
> get to such a flight recorder.
>
> You might object that events will get out-of-order if you merge multiple
> logs after the fact, but that's the case anyway if we use the tracing
> framework since that's always a ringbuffer within the kernel and not
> synchronous. The only thing we could do is allow userspace to push
> markers into that ringbuffer, which is easily done by adding more debug
> output lines (heck we could even add a logging cookie to certain ioctl
> when userspace really cares about knowing exact ordering of it's
> requests with the stuff the kernel does).
>
> - If you really want direct deliver to userspace I guess we could do
> something where sessiond opens the flight recorder fd for you, sets it
> all up and passes it to the compositor. But I'm really not a big fan of
> sending the full kms dbg spam to compositors to freely digest in real
> time.
>
> > - per debug-print selection of messages (finer or coarser, categories
> > within a kernel sub-system could be enough)
> > - per originating device (driver instance) selection of messages
>
> The dyndbg stuff can do all that already, which is why I'm so much in
> favour of relying on that framework.
>
> > - all selections tailored separately for each userspace subscriber
> > (- per open device file description selection of messages)
>
> Again this feels like a userspace problem. Sessions could register what
> kind of info they need for their session, and something like journald can
> figure out how to record it all.
>
> If you want the kernel to keep separate flight recorders I guess we could
> add that, but I don't think it currently exists for the dyndbg stuff at
> least. Maybe a flight recorder v2 feature, once the basics are in.
>
> > That's my idea of it. It is interesting to see how far the requirements
> > can be reasonably realised.
>
> I think aside from the "make it available directly to unpriviledged
> userspace" everything sounds reasonable and doable.
>
> More on the process side of things, I think Jim is very much looking for
> acks and tested-by by people who are interested in better drm logging
> infra. That should help that things are moving in a direction that's
> actually useful, even when it's not yet entirely complete.
>
> Cheers, Sima
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch