Re: [PATCH] [2.6] PPC64: log firmware errors during boot.

From: linas
Date: Thu Jul 01 2004 - 19:21:52 EST


On Wed, Jun 30, 2004 at 08:55:15PM +1000, Paul Mackerras wrote:
> Linas,
>
> > Firmware can report errors at any time, and not atypically during boot.
> > However, these reports were being discarded until th rtasd comes up,
> > which occurs fairly late in the boot cycle. As a result, firmware
> > errors during boot were being silently ignored.
>
> As far as I can see the main change is in log_rtas_len, which is
> called from pSeries_log_error, which is called from do_event_scan and
> rtasd(), and do_event_scan is only called from rtasd(). And
> get_eventscan_parms() is already called at the beginning of rtasd().

Yes, but rtasd starts up late in the book process. Most of the
"interesting" manipulations with firmware are old history by then,
and thus, any firmware errors encountered during the boot were never
logged.

> So I don't see the point of the get_eventscan_parms call in
> log_rtas_len.

If the parms aren't set up, then the rtas_error_log_max is zero,
and, as a result, the message is never logged. By initializing
rtas_error_log_max to the correct non-zero value, the errors can
get logged.


> > This patch at least gets them printk'ed so that at least they show
> > up in boot.msg/syslog. There are two other logging mechanisms,
> > nvram and rtas, that I didn't touch because I don't understand
> > the reprecussions. In particular, nvram logging isn't enabled
> > until late in the boot ... but what's the point of nvram logging
> > if not to catch messages that occured very early in boot ??
>
> Indeed.
>
> As for printk'ing the errors, it is annoying and it seems of somewhat
> dubious benefit to me, given that it is just incomprehensible hex
> numbers that can go on and on. There has to be a better way.

Yes, well, you'll be hard-pressed to find a lover of the hex format
anywhere. Lets review the history of the design decisions that
got us to this point. I think a better solution might then become
evident.

-- Originally, these binary messages from firmware were decoded
in the kernel, and printed out in 'plain english'. However, there
were problems: 1) the format of the binary kept evolving; I think
we are now up to version 6. 2) the need for supporting version 6
and *all* of the earlier versions lead to dreaded kernel bloat.
For the current user-space decoder:
# wc *.c *.h
2207 7056 67959 total

-- So the decision was wisely made to move this all to user-space.
But what shall the communications link between user-space and kernel be?
Somebody, somewhere, I know not who or why, decided that they should
go into syslog. And so here we are.

How else could we do this? I have never had to architect a kernel-to-user
data communications interface, so I don't know what the alternatives
are. We could queue them up to some file in /proc, which user-space
reads. Or maybe /sys instead ?? Maybe a stunt with sockets? Some
new device in /dev/ that can be opened, read, closed? How should
the user space daemon indicate that its picked up the message and
doesn't need it any more? Write a msg number to a /proc file?
Maybe each individual message should go in its own file, and user
space just rm's that file after its fetched/saved the message.
I dunno, I think any one of these could be whipped up in a jiffy.
Convincing the user-space to use the interface might be harder.

Pick one. If it can be coded in under a day, I can volunteer to
do that.

> Putting
> it in nvram seems like a better option to me. I don't know of any
> reason why we can't use nvram quite early on.

Me neither, Jake knows. I thought the whole point of nvram was to not
loose the messages during crash; the messages are promtly copied out
of nvram once the system is up and stable; nvram is a staging area,
not a permanent repository.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/