Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)

From: Sergey Senozhatsky
Date: Mon Dec 02 2019 - 20:17:27 EST


On (19/12/02 17:37), John Ogness wrote:
> On 2019-12-02, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> > +/* Reserve a new descriptor, invalidating the oldest if necessary. */
> >> > +static bool desc_reserve(struct printk_ringbuffer *rb, u32 *id_out)
> >> > +{
> >> > + struct prb_desc_ring *desc_ring = &rb->desc_ring;
> >> > + struct prb_desc *desc;
> >> > + u32 id_prev_wrap;
> >> > + u32 head_id;
> >> > + u32 id;
> >> > +
> >> > + head_id = atomic_read(&desc_ring->head_id);
> >> > +
> >> > + do {
> >> > + desc = to_desc(desc_ring, head_id);
> >> > +
> >> > + id = DESC_ID(head_id + 1);
> >> > + id_prev_wrap = DESC_ID_PREV_WRAP(desc_ring, id);
> >> > +
> >> > + if (id_prev_wrap == atomic_read(&desc_ring->tail_id)) {
> >> > + if (!desc_push_tail(rb, id_prev_wrap))
> >> > + return false;
> >> > + }
> >> > + } while (!atomic_try_cmpxchg(&desc_ring->head_id, &head_id, id));
> >>
> >> Hmm, in theory, ABA problem might cause that we successfully
> >> move desc_ring->head_id when tail has not been pushed yet.
> >>
> >> As a result we would never call desc_push_tail() until
> >> it overflows again.
> >>
> >> I am not sure if we need to take care of it. The code is called with
> >> interrupts disabled. IMHO, only NMI could cause ABA problem
> >> in reality. But the game (debugging) is lost anyway when NMI ovewrites
> >> the buffer several times.
> >
> > BTW: If I am counting correctly. The ABA problem would happen when
> > exactly 2^30 (1G) messages is written in the mean time.
>
> All the ringbuffer code assumes that the use of index numbers handles
> the ABA problem (i.e. there must not be 1 billion printk's within an
> NMI). If we want to support 1 billion+ printk's within an NMI, then
> perhaps the index number should be increased. For 64-bit systems it
> would be no problem to go to 62 bits. For 32-bit systems, I don't know
> how well the 64-bit atomic operations are supported.

ftrace dumps from NMI (DUMP_ALL type ftrace_dump_on_oops on a $BIG
machine)? 1G seems large enough, but who knows.

-ss