Re: [PATCH v2 6/9] kgdboc: Add earlycon_kgdboc to support early kgdb using boot consoles
From: Daniel Thompson
Date: Mon Apr 27 2020 - 12:36:52 EST
On Tue, Apr 21, 2020 at 02:14:44PM -0700, Douglas Anderson wrote:
> We want to enable kgdb to debug the early parts of the kernel.
> Unfortunately kgdb normally is a client of the tty API in the kernel
> and serial drivers don't register to the tty layer until fairly late
> in the boot process.
>
> Serial drivers do, however, commonly register a boot console. Let's
> enable the kgdboc driver to work with boot consoles to provide early
> debugging.
>
> This change co-opts the existing read() function pointer that's part
> of "struct console". It's assumed that if a boot console (with the
> flag CON_BOOT) has implemented read() that both the read() and write()
> function are polling functions. That means they work without
> interrupts and read() will return immediately (with 0 bytes read) if
> there's nothing to read. This should be a safe assumption since it
> appears that no current boot consoles implement read() right now and
> there seems no reason to do so unless they wanted to support
> "earlycon_kgdboc".
>
> The console API isn't really intended to have clients work with it
> like we're doing. Specifically there doesn't appear to be any way for
> clients to be notified about a boot console being unregistered. We'll
> work around this by checking that our console is still valid before
> using it. We'll also try to transition off of the boot console and
> onto the "tty" API as quickly as possible.
>
> The normal/expected way to make all this work is to use
> "earlycon_kgdboc" and "kgdboc" together. You should point them both
> to the same physical serial connection. At boot time, as the system
> transitions from the boot console to the normal console, kgdb will
> switch over. If you don't use things in the normal/expected way it's
> a bit of a buyer-beware situation. Things thought about:
>
> - If you specify only "earlycon_kgdboc" but not "kgdboc" you still
> might end up dropping into kgdb upon a crash/sysrq but you may not
> be able to type.
> - If you use "keep_bootcon" (which is already a bit of a buyer-beware
> option) and specify "earlycon_kgdboc" but not "kgdboc" we'll keep
> trying to use your boot console for kgdb.
> - If your "earlycon_kgdboc" and "kgdboc" devices are not the same
> device things should work OK, but it'll be your job to switch over
> which device you're monitoring (including figuring out how to switch
> over gdb in-flight if you're using it).
>
> When trying to enable "earlycon_kgdboc" it should be noted that the
> names that are registered through the boot console layer and the tty
> layer are not the same for the same port. For example when debugging
> on one board I'd need to pass "earlycon_kgdboc=qcom_geni
> kgdboc=ttyMSM0" to enable things properly. Since digging up the boot
> console name is a pain and there will rarely be more than one boot
> console enabled, you can provide the "earlycon_kgdboc" parameter
> without specifying the name of the boot console. In this case we'll
> just pick the first boot that implements read() that we find.
>
> This new "earlycon_kgdboc" parameter should be contrasted to the
> existing "ekgdboc" parameter. While both provide a way to debug very
> early, the usage and mechanisms are quite different. Specifically
> "earlycon_kgdboc" is meant to be used in tandem with "kgdboc" and
> there is a transition from one to the other. The "ekgdboc" parameter,
> on the other hand, replaces the "kgdboc" parameter. It runs the same
> logic as the "kgdboc" parameter but just relies on your TTY driver
> being present super early. The only known usage of the old "ekgdboc"
> parameter is documented as "ekgdboc=kbd earlyprintk=vga". It should
> be noted that "kbd" has special treatment allowing it to init early as
> a tty device.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
Again, very happy with the overall approach, just a few quibbles.
> ---
> This patch touches files in several different subsystems, but it
> touches a single line and that line is related to kgdb. I'm assuming
> this can all go through the kgdb tree, but if needed I can always
> introduce a new API call instead of modifying the old one and then
> just have the old API call be a thin wrapper on the new one.
Funny you should say that!
I don't really like that extra argument although it is nothing to do
with simplifying merges...
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 519d8cfbfbed..2f526f2d2bea 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -409,6 +465,80 @@ static int __init kgdboc_early_init(char *opt)
> }
>
> early_param("ekgdboc", kgdboc_early_init);
> +
> +static int earlycon_kgdboc_get_char(void)
> +{
> + char c;
> +
> + if (earlycon_neutered || !earlycon->read(earlycon, &c, 1))
> + return NO_POLL_CHAR;
> +
> + return c;
> +}
> +
> +static void earlycon_kgdboc_put_char(u8 chr)
> +{
> + if (!earlycon_neutered)
> + earlycon->write(earlycon, &chr, 1);
> +}
> +
> +static void earlycon_kgdboc_pre_exp_handler(void)
> +{
> + /*
> + * We don't get notified when the boot console is unregistered.
> + * Double-check when we enter the debugger. Unfortunately we
> + * can't really unregister ourselves now, but at least don't crash.
> + */
> + if (earlycon && !earlycon_neutered && !is_earlycon_still_valid()) {
> + pr_warn("Neutering kgdb since boot console vanished\n");
> + earlycon_neutered = true;
This is, IMHO, too subtle.
I don't think this is merely a warning with a gentle message about
neutering. IIUC the system is (or will shortly be) dead in the water.
After diligently stopping all the CPUs the debug-core will then start
waiting for a character that cannot possibly come!
I think this might be one of those vanishingly rare places where
panicing might actually the right thing to do... although only after
neutering" the kgdb panic handler first ;-).
> + }
> +}
> +
> +static struct kgdb_io earlycon_kgdboc_io_ops = {
> + .name = "earlycon_kgdboc",
> + .read_char = earlycon_kgdboc_get_char,
> + .write_char = earlycon_kgdboc_put_char,
> + .pre_exception = earlycon_kgdboc_pre_exp_handler,
> + .is_console = true,
> +};
> +
> +static int __init earlycon_kgdboc_init(char *opt)
> +{
> + struct console *con;
> +
> + kdb_init(KDB_INIT_EARLY);
This is normally taken care of by debug-core.c . Could this be
integrated into kgdb_register_io_module() ?
> +
> + /*
> + * Look for a matching console, or if the name was left blank just
> + * pick the first one we find.
> + */
> + console_lock();
> + for_each_console(con) {
> + if (con->write && con->read &&
> + (con->flags & (CON_BOOT | CON_ENABLED)) &&
> + (!opt || !opt[0] || strcmp(con->name, opt) == 0))
> + break;
> + }
> + console_unlock();
> +
> + if (!con) {
> + pr_info("Couldn't find kgdb earlycon\n");
> + return 0;
> + }
> +
> + earlycon = con;
> + pr_info("Going to register kgdb with earlycon '%s'\n", con->name);
> + if (kgdb_register_io_module(&earlycon_kgdboc_io_ops, false) != 0) {
> + earlycon = NULL;
> + pr_info("Failed to register kgdb with earlycon\n");
> + return 0;
> + }
> +
> + return 0;
> +}
> +
> +early_param("earlycon_kgdboc", earlycon_kgdboc_init);
> #endif /* CONFIG_KGDB_SERIAL_CONSOLE */
>
> module_init(init_kgdboc);
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 8f178239856d..1b5435c6d92a 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -1074,16 +1074,21 @@ EXPORT_SYMBOL_GPL(kgdb_schedule_breakpoint);
> /**
> * kgdb_register_io_module - register KGDB IO module
> * @new_dbg_io_ops: the io ops vector
> + * @replace: If true it's OK if there were old ops. This is used
> + * to transition from early kgdb to normal kgdb. It's
> + * assumed these are the same device so kgdb can continue.
> *
> * Register it with the KGDB core.
> */
> -int kgdb_register_io_module(struct kgdb_io *new_dbg_io_ops)
> +int kgdb_register_io_module(struct kgdb_io *new_dbg_io_ops, bool replace)
As I said I'm not a big fan of the extra argument. It makes the call
sites harder to read.
Could earlycon_kgdboc be registered with a boolean flag set so that
a subsequent register will automatically replace the old one
(maybe "is_replaceable" or "is_temporary")?
For bonus marks the core could also enforce that a replaceable io ops
table must have init set to null (because there is no deinit).
Daniel.