misc details: was: Re: [PATCH printk v1 09/18] printk: nobkl: Add print state functions

From: Petr Mladek
Date: Wed Mar 29 2023 - 10:07:17 EST


On Thu 2023-03-02 21:02:09, John Ogness wrote:
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Provide three functions which are related to the safe handover
> mechanism and allow console drivers to denote takeover unsafe
> sections:
>
> - console_can_proceed()
>
> Invoked by a console driver to check whether a handover request
> is pending or whether the console was taken over in a hostile
> fashion.
>
> - console_enter/exit_unsafe()
>
> Invoked by a console driver to denote that the driver output
> function is about to enter or to leave an critical region where a
> hostile take over is unsafe. These functions are also
> cancellation points.
>
> The unsafe state is stored in the console state and allows a
> takeover attempt to make informed decisions whether to take over
> and/or output on such a console at all. The unsafe state is also
> available to the driver in the write context for the
> atomic_write() output function so the driver can make informed
> decisions about the required actions or take a special emergency
> path.
>
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -947,6 +947,145 @@ static void cons_free_percpu_data(struct console *con)
> con->pcpu_data = NULL;
> }
>
> +/**
> + * console_can_proceed - Check whether printing can proceed
> + * @wctxt: The write context that was handed to the write function
> + *
> + * Returns: True if the state is correct. False if a handover
> + * has been requested or if the console was taken
> + * over.
> + *
> + * Must be invoked after the record was dumped into the assigned record
> + * buffer and at appropriate safe places in the driver. For unsafe driver
> + * sections see console_enter_unsafe().
> + *
> + * When this function returns false then the calling context is not allowed
> + * to go forward and has to back out immediately and carefully. The buffer
> + * content is no longer trusted either and the console lock is no longer
> + * held.
> + */
> +bool console_can_proceed(struct cons_write_context *wctxt)
> +{
> + struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> + struct console *con = ctxt->console;
> + struct cons_state state;
> +
> + cons_state_read(con, CON_STATE_CUR, &state);
> + /* Store it for analysis or reuse */
> + copy_full_state(ctxt->old_state, state);
> +
> + /* Make sure this context is still the owner. */
> + if (!cons_state_full_match(state, ctxt->state))
> + return false;
> +
> + /*
> + * Having a safe point for take over and eventually a few
> + * duplicated characters or a full line is way better than a
> + * hostile takeover. Post processing can take care of the garbage.
> + * Continue if the requested priority is not sufficient.
> + */
> + if (state.req_prio <= state.cur_prio)
> + return true;
> +
> + /*
> + * A console printer within an unsafe region is allowed to continue.
> + * It can perform the handover when exiting the safe region. Otherwise
> + * a hostile takeover will be necessary.
> + */
> + if (state.unsafe)
> + return true;

It took me quite some time to scratch my head around the above two comments.
The code is clear but the comments are somehow cryptic ;-)

It is probably because the 1st comment starts talking about a safe point.
But .unsafe is checked after the 2nd comment. And the word "allowed"
confused me in the 2nd comment.

I would explain this in these details in the function description. The
code will be self-explanatory then. I would write something like:

* The console is allowed to continue when it still owns the lock
* and there is no request from a higher priority context.
*
* The context might have lost the lock during a hostile takeover.
*
* The function will handover the lock when there is a request
* with a higher priority and the console is in a safe context.
* The new owner would print the same line again. But a duplicated
* part of a line is better than risking a hostile takeover in
* an unsafe context.

> +
> + /* Release and hand over */
> + cons_release(ctxt);
> + /*
> + * This does not check whether the handover succeeded.

This is a bit ambiguous. What exactly means that the handover succeeded?
I guess that it means that the context with the higher priority
successfully took over the lock.

A "failure" might be when the other context timeouted and given up.
In that case, nobody would continue printing.

We actually should wake up the kthread when the lock was not
successfully passed. Or even better, we should release the lock
only when the request was still pending. It should be possible
with the cmpxchg().


> + * outermost callsite has to make the final decision whether printing
> + * should continue

This is a bit misleading. The current owner could not continue after
loosing the lock. It would need to re-start the entire operation.

Is this about the kthread or cons_flush*() layers? Yes, they have to
decide what to do next. Well, we should make it clear that we are talking
about this layer. The con->atomic_write() layer can only carefully
back off.

Maybe, we do not need to describe it here. It should be enough to
mention in the function description that the driver has to
carefully back off and that the buffer content is not longer
trusted. It is already mentioned there.

> + * or not (via reacquire, possibly hostile). The
> + * console is unlocked already so go back all the way instead of
> + * trying to implement heuristics in tons of places.
> + */
> + return false;
> +}
> +
> +static bool __console_update_unsafe(struct cons_write_context *wctxt, bool unsafe)
> +{
> + struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> + struct console *con = ctxt->console;
> + struct cons_state new;
> +
> + do {
> + if (!console_can_proceed(wctxt))
> + return false;
> + /*
> + * console_can_proceed() saved the real state in
> + * ctxt->old_state
> + */
> + copy_full_state(new, ctxt->old_state);
> + new.unsafe = unsafe;
> +
> + } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &ctxt->old_state, &new));

This updates only the bit in struct cons_state. But there is also
"bool unsafe" in struct cons_write_context. Should the boolean
be updated as well?

Or is the boolean needed at all? It seems that it is set in
cons_emit_record() and never read.

> +
> + copy_full_state(ctxt->state, new);
> + return true;
> +}

Best Regards,
Petr