Re: [PATCH] printk: inject caller information into the body of message

From: Tetsuo Handa
Date: Tue Oct 09 2018 - 17:19:24 EST


On 2018/10/09 23:52, Petr Mladek wrote:
> On Tue 2018-10-09 05:48:33, Tetsuo Handa wrote:
>> On 2018/10/09 1:03, Petr Mladek wrote:
>>> On Mon 2018-10-08 19:31:58, Tetsuo Handa wrote:
>>>> A structure named "struct printk_buffer" is introduced for buffering
>>>> up to LOG_LINE_MAX bytes of printk() output which did not end with '\n'.
>>>>
>>>> A caller is allowed to allocate/free "struct printk_buffer" using
>>>> kzalloc()/kfree() if that caller is in a location where it is possible
>>>> to do so.
>>>>
>>>> A macro named "DEFINE_PRINTK_BUFFER()" is defined for allocating
>>>> "struct printk_buffer" from the stack memory or in the .bss section.
>>>>
>>>> But since sizeof("struct printk_buffer") is nearly 1KB, it might not be
>>>> preferable to allocate "struct printk_buffer" from the stack memory.
>>>> In that case, a caller can use best-effort buffering mode. Two functions
>>>> get_printk_buffer() and put_printk_buffer() are provided for that mode.
>>>>
>>>> get_printk_buffer() tries to assign a "struct printk_buffer" from
>>>> statically preallocated array. It returns NULL if all static
>>>> "struct printk_buffer" are in use.
>>>>
>>>> put_printk_buffer() flushes and releases the "struct printk_buffer".
>>>> put_printk_buffer() must match corresponding get_printk_buffer() as with
>>>> rcu_read_unlock() must match corresponding rcu_read_lock().
>>>
>>> One problem with this API is when it is used in more complicated code
>>> and put_printk_buffer() is not called in some path. I mean leaking.
>>> We might get out of buffers easily.
>>
>> Then, as an debugging config option for statically preallocated buffers,
>> we could record how get_printk_buffer() was called, like lockdep records
>> where a lock was taken.
>
> Another solution might be to store some timestamp (jiffies?) into
> struct printk_buffer when a new message is added. Then we could flush
> stalled buffers in get_printk_buffer() with some warning.

I don't think it will work. What the threshold should be? It is possible that
a thread spends very long time (many seconds for e.g. SysRq-t) between
get_printk_buffer() and put_printk_buffer(). Therefore, the threshold will
have to be very very long. As soon as we reach out of statically preallocated
buffers, we need to fallback to unbuffered printk() before such threshold
elapses.

>
> Unfortunately, it might be unsafe to put the stalled buffers.
> Well, it might be safe if there is a lock less access. I wonder
> if we could reuse the printk_safe code here.
>
> Anyway, I would like to have a solution before we add the new
> API into the kernel. We would need it sooner or later anyway.
> And I would like to be sure that the API is sane.

If we worry about get_printk_buffer() without corresponding put_printk_buffer(),
we will also need to worry about a "struct printk_buffer" returned by
get_printk_buffer() is by error shared by multiple threads. We will have to
complicate buffered_printk() by using cmpxchg() & retry logic, but the output is
after all mixed as with simply fallback to unbuffered printk() does.

Do you think that adding cmpxchg() & retry logic to this API generates better
result than simple fallback? buffered_printk() does not add a new locking dependency
is a good point of this API. Showing the backtrace (by enabling a debug kernel config
option for this API) will be sufficient.