Re: [PATCH v1] dynamic_debug: add support for logs destination
From: Pekka Paalanen
Date: Thu Oct 12 2023 - 04:56:02 EST
On Wed, 11 Oct 2023 11:42:24 +0200
Daniel Vetter <daniel@xxxxxxxx> wrote:
> On Wed, Oct 11, 2023 at 11:48:16AM +0300, Pekka Paalanen wrote:
> > On Tue, 10 Oct 2023 10:06:02 -0600
> > jim.cromie@xxxxxxxxx wrote:
> >
> > > since I name-dropped you all,
> >
> > Hi everyone,
> >
> > I'm really happy to see this topic being developed! I've practically
> > forgot about it myself, but the need for it has not diminished at all.
> >
> > I didn't understand much of the conversation, so I'll just reiterate
> > what I would use it for, as a Wayland compositor developer.
> >
> > I added a few more cc's to get better coverage of DRM and Wayland
> > compositor developers.
> >
> > > On Tue, Oct 10, 2023 at 10:01 AM <jim.cromie@xxxxxxxxx> wrote:
> > > >
> > > > On Mon, Oct 9, 2023 at 4:47 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote:
> >
> > ...
> >
> > > > > I don't have a real life use case to configure different trace
> > > > > instance for each callsite.
> > > > > I just tried to be as much flexible as possible.
> > > > >
> > > >
> > > > Ive come around to agree - I looked back at some old threads
> > > > (that I was a part of, and barely remembered :-}
> > > >
> > > > At least Sean Paul, Lyude, Simon Ser, Pekka Paalanen
> > > > have expressed a desire for a "flight-recorder"
> > > > it'd be hard to say now that 2-3 such buffers would always be enough,
> > > > esp as theres a performance reason for having your own.
> >
> > A Wayland compositor has roughly three important things where the kernel
> > debugs might come in handy:
> > - input
> > - DRM KMS
> > - DRM GPU rendering
> >
> > DRM KMS is the one I've been thinking of in the flight recorder context
> > the most, because KMS hardware varies a lot, and there is plenty of
> > room for both KMS drivers and KMS userspace to go wrong. The usual
> > result is your display doesn't work, so the system is practically
> > unusable to the end user. In the wild, the simplest or maybe the only
> > way out of that may be a reboot, maybe an automated one (e.g. digital
> > signage). In order to debug such problems, we would need both
> > compositor logs and the relevant kernel debug messages.
> >
> > For example, Weston already has a flight recorder framework of its own,
> > so we have the compositor debug logs. It would be useful to get the
> > selected kernel debug logs in the same place. It could be used for
> > automated or semi-manual bug reporting, for example, making the
> > administrator or end user life much easier reporting issues.
> >
> > Since this is usually a production environment, and the Wayland
> > compositor runs without root privileges, we need something that works
> > with that. We would likely want the kernel debug messages in the
> > compositor to combine and order them properly with the compositor debug
> > messages.
> >
> > It's quite likely that developers would like to pick and choose which
> > kernel debug messages might be interesting enough to record, to avoid
> > excessive log flooding. The flight recorder in Weston is fixed size to
> > avoid running out of memory or disk space. I can also see that Weston
> > could have debugging options that affect which kernel debug messages it
> > subscribes to. We can have a reasonable default setup that allows us to
> > pinpoint the problem area and figure out most problems, and if needed,
> > we could ask the administrator pass another debug option to Weston. It
> > helps if there is just one place to configure everything about the
> > compositor.
> >
> > This implies that it would be really nice to have userspace subscriber
> > specific debug message streams from the kernel, or a good way to filter
> > the messages we want. A Wayland compositor would not be interested in
> > file system or wireless debugs for example, but another system
> > component might be. There is also a security aspect of which component is
> > allowed to see which messages in case they could contain anything
> > sensitive (input debug could contain typed passwords).
> >
> > Configuring the kernel debug message selection for our debug message
> > stream can and probably should require elevated privileges, and we can
> > likely solve that in userspace with a daemon or such, to allow the
> > Wayland compositor to run as a regular user.
> >
> > Thinking of desktop systems, and especially physically multi-seat systems:
> > - there can be multiple different Wayland compositors running simultaneously
> > - each of them may want debug messages only from a specific DRM KMS
> > device instance, and not from others
> > - each of them may have a different idea of which debug messages are important
> > - because DRM KMS leasing is a thing, different compositor instances
> > could be using the same DRM KMS device instance simultaneously; since
> > this is specific to DRM KMS, and it should be harmless to get a
> > little too much DRM KMS debug (that is, from the whole device instead
> > of just the leased parts), it may not be worth to consider splitting
> > debug message streams this far.
> >
> > If userspace is offered some standardised fields in kernel debug
> > structures, then userspace could do some filtering on its own too, but I
> > guess it would be better to filter at the source and not need that.
> >
> > There is also an anti-goal. The kernel debug message contents are
> > specifically not machine-parsable. I very much do not want to impose
> > debug strings as ABI, they are for human (and AI?) readers only.
> >
> >
> > As a summary, here are the most important requirements first:
> > - usable in production as a normal thing to enable always by default
> > - final delivery to unprivileged userspace process
>
> I think this is the one that's trickiest, and I don't fully understand why
> you need it.
It's not an arbitrary unprivileged userspace process. It is an
unprivileged userspace process configured by the system administrator
to receive the information.
I would simply like it to be so that you don't need to have a userspace
process running as root shovelling the bulk debug message data
manually, but that the target process can read it directly e.g. by
getting passed an appropriate fd to it.
> The issues I'm seeing:
>
> - logs tend to leak a lot of kernel internal state that's useful for
> attacks. There's measures for the worst (like obfuscating kernel
> pointers by hashing them), so there's always going to be a difference
> here between what full sysadmin can get and what unpriviledged userspace
> can get. And there's always a risk we miss something that we should
> obfuscate but didn't.
>
> - handing this to userspace increases the risks it becomes uapi. Who's
> going to stop compositors from sussing out the reason an atomic commit
> failed from the logs if they can get them easily, and these logs contain
> very interesting information about why something failed?
That problem exists regardless of where in userspace the data goes to.
Compositors could as well get it from journald if they need it. Or from
the same place where 'dmesg' gets its stuff from, with a tiny helper
with the needed access.
> Much better if journald or a crash handler assemebles all the different
> flight recorder logs and packages it into a bug report so that the
> compositor cannot ever get at these directly. Yeah this needs some OS
> support with a dbus request or similar so that the compositor can ask
> for a crash dump with everything relevant to its session.
Ok.
> - the idea of an in-kernel flight recorder is that it's really fast. The
> entire tracing infra is built such that recording an event is really
> quick, but printing it is not - the entire string formatting is delayed
> to when userspace reads the buffers. If you constantly push the log
> messages to userspace we toss the advantage of the low-overhead
> in-kernel flight recorder. If you push logs to dmesg there's a
> substantial measureable overhead which you don't really want in
> production, and your requirement would impose the same.
Right.
> - I'm not sure how this is supposed to mesh with userspace log aggregators
> like journald when every compositor has it's own flight recorder on top.
> Feels a bit like a solution that ignores the entire os stack and assumes
> that the only pieces we can touch are the kernel and the compositor to
> get to such a flight recorder.
>
> You might object that events will get out-of-order if you merge multiple
> logs after the fact, but that's the case anyway if we use the tracing
> framework since that's always a ringbuffer within the kernel and not
> synchronous. The only thing we could do is allow userspace to push
> markers into that ringbuffer, which is easily done by adding more debug
> output lines (heck we could even add a logging cookie to certain ioctl
> when userspace really cares about knowing exact ordering of it's
> requests with the stuff the kernel does).
That's a good point. So we need to agree on the clock to timestamp all
messages in all components, both kernel and userspace, and then use a
log aggregator to interleave logs by timestamp. Is that clock already
defined as UABI?
The compositor flight recorder is a ring buffer too, because it would
be far too much volume to send it all to another process all the time.
>
> - If you really want direct deliver to userspace I guess we could do
> something where sessiond opens the flight recorder fd for you, sets it
> all up and passes it to the compositor. But I'm really not a big fan of
> sending the full kms dbg spam to compositors to freely digest in real
> time.
Well, something has to get the logs in userspace, be that a compositor,
systemd-journald, or another aggregator. And that needs to be available
in production, so not via debugfs. If it does not need too run as root,
even better.
It doesn't need to be real-time all the time, it is enough if it can be
triggered when needed.
> > - per debug-print selection of messages (finer or coarser, categories
> > within a kernel sub-system could be enough)
> > - per originating device (driver instance) selection of messages
>
> The dyndbg stuff can do all that already, which is why I'm so much in
> favour of relying on that framework.
Cool. I have no idea of any frameworks, and no opinion of which one
should be used, anyway.
> > - all selections tailored separately for each userspace subscriber
> > (- per open device file description selection of messages)
>
> Again this feels like a userspace problem. Sessions could register what
> kind of info they need for their session, and something like journald can
> figure out how to record it all.
Only if the kernel actually attaches all the required information to
the debug messages *in machine readable form* so that userspace
actually can do the filtering. And that makes *that* information UABI.
Maybe that's fine? I wouldn't know.
> If you want the kernel to keep separate flight recorders I guess we could
> add that, but I don't think it currently exists for the dyndbg stuff at
> least. Maybe a flight recorder v2 feature, once the basics are in.
Maybe, maybe not necessary, I'm not sure. The thing is, ring buffers
are of fixed size for a reason, they always overwrite oldest messages,
so the volume of messages with ring buffer size determines how long
into the past you can get the messages. Different message sources can
have widely different bandwidth (volume), and different use cases need
logs for different past time periods.
It's really cool if there is no need to copy the same data into
multiple ring buffers, but it is bad to not be able to get logs far
enough back when some unrelated thing is spamming the same ring buffer.
You said that turning the kernel ring buffer contents into strings is a
very heavy operation, so it is not possible to push this scope
separation to userspace, right?
> > That's my idea of it. It is interesting to see how far the requirements
> > can be reasonably realised.
>
> I think aside from the "make it available directly to unpriviledged
> userspace" everything sounds reasonable and doable.
I never said "free for all". I meant specifically chosen unprivileged
(not running as root or with caps) processes.
> More on the process side of things, I think Jim is very much looking for
> acks and tested-by by people who are interested in better drm logging
> infra. That should help that things are moving in a direction that's
> actually useful, even when it's not yet entirely complete.
I'd love to help with that, but I can barely afford to send this email.
Do let me know if my comments are not helping, so I can do other things
instead.
Thanks,
pq
Attachment:
pgpNKY5NW3Saw.pgp
Description: OpenPGP digital signature