Re: [PATCH v1] dynamic_debug: add support for logs destination
From: Daniel Vetter
Date: Wed Oct 11 2023 - 05:42:35 EST
On Wed, Oct 11, 2023 at 11:48:16AM +0300, Pekka Paalanen wrote:
> On Tue, 10 Oct 2023 10:06:02 -0600
> jim.cromie@xxxxxxxxx wrote:
>
> > since I name-dropped you all,
>
> Hi everyone,
>
> I'm really happy to see this topic being developed! I've practically
> forgot about it myself, but the need for it has not diminished at all.
>
> I didn't understand much of the conversation, so I'll just reiterate
> what I would use it for, as a Wayland compositor developer.
>
> I added a few more cc's to get better coverage of DRM and Wayland
> compositor developers.
>
> > On Tue, Oct 10, 2023 at 10:01 AM <jim.cromie@xxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 9, 2023 at 4:47 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
>
> ...
>
> > > > I don't have a real life use case to configure different trace
> > > > instance for each callsite.
> > > > I just tried to be as much flexible as possible.
> > > >
> > >
> > > Ive come around to agree - I looked back at some old threads
> > > (that I was a part of, and barely remembered :-}
> > >
> > > At least Sean Paul, Lyude, Simon Ser, Pekka Paalanen
> > > have expressed a desire for a "flight-recorder"
> > > it'd be hard to say now that 2-3 such buffers would always be enough,
> > > esp as theres a performance reason for having your own.
>
> A Wayland compositor has roughly three important things where the kernel
> debugs might come in handy:
> - input
> - DRM KMS
> - DRM GPU rendering
>
> DRM KMS is the one I've been thinking of in the flight recorder context
> the most, because KMS hardware varies a lot, and there is plenty of
> room for both KMS drivers and KMS userspace to go wrong. The usual
> result is your display doesn't work, so the system is practically
> unusable to the end user. In the wild, the simplest or maybe the only
> way out of that may be a reboot, maybe an automated one (e.g. digital
> signage). In order to debug such problems, we would need both
> compositor logs and the relevant kernel debug messages.
>
> For example, Weston already has a flight recorder framework of its own,
> so we have the compositor debug logs. It would be useful to get the
> selected kernel debug logs in the same place. It could be used for
> automated or semi-manual bug reporting, for example, making the
> administrator or end user life much easier reporting issues.
>
> Since this is usually a production environment, and the Wayland
> compositor runs without root privileges, we need something that works
> with that. We would likely want the kernel debug messages in the
> compositor to combine and order them properly with the compositor debug
> messages.
>
> It's quite likely that developers would like to pick and choose which
> kernel debug messages might be interesting enough to record, to avoid
> excessive log flooding. The flight recorder in Weston is fixed size to
> avoid running out of memory or disk space. I can also see that Weston
> could have debugging options that affect which kernel debug messages it
> subscribes to. We can have a reasonable default setup that allows us to
> pinpoint the problem area and figure out most problems, and if needed,
> we could ask the administrator pass another debug option to Weston. It
> helps if there is just one place to configure everything about the
> compositor.
>
> This implies that it would be really nice to have userspace subscriber
> specific debug message streams from the kernel, or a good way to filter
> the messages we want. A Wayland compositor would not be interested in
> file system or wireless debugs for example, but another system
> component might be. There is also a security aspect of which component is
> allowed to see which messages in case they could contain anything
> sensitive (input debug could contain typed passwords).
>
> Configuring the kernel debug message selection for our debug message
> stream can and probably should require elevated privileges, and we can
> likely solve that in userspace with a daemon or such, to allow the
> Wayland compositor to run as a regular user.
>
> Thinking of desktop systems, and especially physically multi-seat systems:
> - there can be multiple different Wayland compositors running simultaneously
> - each of them may want debug messages only from a specific DRM KMS
> device instance, and not from others
> - each of them may have a different idea of which debug messages are important
> - because DRM KMS leasing is a thing, different compositor instances
> could be using the same DRM KMS device instance simultaneously; since
> this is specific to DRM KMS, and it should be harmless to get a
> little too much DRM KMS debug (that is, from the whole device instead
> of just the leased parts), it may not be worth to consider splitting
> debug message streams this far.
>
> If userspace is offered some standardised fields in kernel debug
> structures, then userspace could do some filtering on its own too, but I
> guess it would be better to filter at the source and not need that.
>
> There is also an anti-goal. The kernel debug message contents are
> specifically not machine-parsable. I very much do not want to impose
> debug strings as ABI, they are for human (and AI?) readers only.
>
>
> As a summary, here are the most important requirements first:
> - usable in production as a normal thing to enable always by default
> - final delivery to unprivileged userspace process
I think this is the one that's trickiest, and I don't fully understand why
you need it. The issues I'm seeing:
- logs tend to leak a lot of kernel internal state that's useful for
attacks. There's measures for the worst (like obfuscating kernel
pointers by hashing them), so there's always going to be a difference
here between what full sysadmin can get and what unpriviledged userspace
can get. And there's always a risk we miss something that we should
obfuscate but didn't.
- handing this to userspace increases the risks it becomes uapi. Who's
going to stop compositors from sussing out the reason an atomic commit
failed from the logs if they can get them easily, and these logs contain
very interesting information about why something failed?
Much better if journald or a crash handler assemebles all the different
flight recorder logs and packages it into a bug report so that the
compositor cannot ever get at these directly. Yeah this needs some OS
support with a dbus request or similar so that the compositor can ask
for a crash dump with everything relevant to its session.
- the idea of an in-kernel flight recorder is that it's really fast. The
entire tracing infra is built such that recording an event is really
quick, but printing it is not - the entire string formatting is delayed
to when userspace reads the buffers. If you constantly push the log
messages to userspace we toss the advantage of the low-overhead
in-kernel flight recorder. If you push logs to dmesg there's a
substantial measureable overhead which you don't really want in
production, and your requirement would impose the same.
- I'm not sure how this is supposed to mesh with userspace log aggregators
like journald when every compositor has it's own flight recorder on top.
Feels a bit like a solution that ignores the entire os stack and assumes
that the only pieces we can touch are the kernel and the compositor to
get to such a flight recorder.
You might object that events will get out-of-order if you merge multiple
logs after the fact, but that's the case anyway if we use the tracing
framework since that's always a ringbuffer within the kernel and not
synchronous. The only thing we could do is allow userspace to push
markers into that ringbuffer, which is easily done by adding more debug
output lines (heck we could even add a logging cookie to certain ioctl
when userspace really cares about knowing exact ordering of it's
requests with the stuff the kernel does).
- If you really want direct deliver to userspace I guess we could do
something where sessiond opens the flight recorder fd for you, sets it
all up and passes it to the compositor. But I'm really not a big fan of
sending the full kms dbg spam to compositors to freely digest in real
time.
> - per debug-print selection of messages (finer or coarser, categories
> within a kernel sub-system could be enough)
> - per originating device (driver instance) selection of messages
The dyndbg stuff can do all that already, which is why I'm so much in
favour of relying on that framework.
> - all selections tailored separately for each userspace subscriber
> (- per open device file description selection of messages)
Again this feels like a userspace problem. Sessions could register what
kind of info they need for their session, and something like journald can
figure out how to record it all.
If you want the kernel to keep separate flight recorders I guess we could
add that, but I don't think it currently exists for the dyndbg stuff at
least. Maybe a flight recorder v2 feature, once the basics are in.
> That's my idea of it. It is interesting to see how far the requirements
> can be reasonably realised.
I think aside from the "make it available directly to unpriviledged
userspace" everything sounds reasonable and doable.
More on the process side of things, I think Jim is very much looking for
acks and tested-by by people who are interested in better drm logging
infra. That should help that things are moving in a direction that's
actually useful, even when it's not yet entirely complete.
Cheers, Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch