Re: [PATCH printk v3 04/14] printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()

From: Petr Mladek
Date: Tue Feb 06 2024 - 12:28:10 EST


On Mon 2024-02-05 12:39:30, John Ogness wrote:
> On 2024-01-15, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> The acquire is with @last_finalized_seq. So the release must also be
> >> with @last_finalized_seq. The important thing is that the CPU that
> >> updates @last_finalized_seq has actually read the corresponding
> >> record beforehand. That is exactly what desc_update_last_finalized()
> >> does.
> >
> > I probably did not describe it well. The CPU updating
> > @last_finalized_seq does the right thing. I was not sure about the CPU
> > which reads @last_finalized_seq via prb_next_seq().
> >
> > To make it more clear:
> >
> > u64 prb_next_seq(struct printk_ringbuffer *rb)
> > {
> > u64 seq;
> >
> > seq = desc_last_finalized_seq(rb);
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > |
> > `-> This includes atomic_long_read_acquire(last_finalized_seq)
> >
> >
> > if (seq != 0)
> > seq++;
> >
> > while (_prb_read_valid(rb, &seq, NULL, NULL))
> > seq++;
> >
> > return seq;
> > }
> >
> > But where is the atomic_long_read_release(last_finalized_seq) in
> > this code path?
>
> read_release? The counterpart of this load_acquire is a
> store_release. For example:
>
> CPU0 CPU1
> ==== ====
> load(varA)
> store_release(varB) load_acquire(varB)
> load(varA)
>
> If CPU1 reads the value in varB that CPU0 stored, then it is guaranteed
> that CPU1 will read the value (or a later value) in varA that CPU0 read.
>
> Translating the above example to this particular patch, we have:
>
> CPU0: desc_update_last_finalized() CPU1: prb_next_seq()
> ==== ====
> _prb_read_valid(seq)
> cmpxchg_release(last_finalized_seq,seq) seq=read_acquire(last_finalized_seq)
> _prb_read_valid(seq)

I think that I was confused because I had acquire/release mentally
connected with lock/unlock. Where the lock/unlock surrounds some
critical code section. I think that it is actually the most common
usecase.

Our usage is not typical from my POV. There are two aspects:

1. They do not surround a critical section, at least not
an obvious one.

2. In my mental code, this patch patch uses the release
before the acquire. When I think about this code,
than I first imagine the write path (using release)
and then the reader (using acquire).

I think that this code (mis)uses the acquire/release
semantic just for optimized memory barriers.

It might be worth a note that this is not a typical acquire/release
scenario.

> > IMHO, the barrier provided by the acquire() is _important_ to make
> > sure that _prb_read_valid() would see the valid descriptor.
>
> Correct.
>
> > Now, I think that the related read_release(seq) is hidden in:
> >
> > static int prb_read(struct printk_ringbuffer *rb, u64 seq,
> > struct printk_record *r, unsigned int *line_count)
> > {
> > /* Get a local copy of the correct descriptor (if available). */
> > err = desc_read_finalized_seq(desc_ring, id, seq, &desc);
> >
> > /* If requested, copy meta data. */
> > if (r->info)
> > memcpy(r->info, info, sizeof(*(r->info)));
> >
> > /* Copy text data. If it fails, this is a data-less record. */
> > if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, info->text_len,
> > r->text_buf, r->text_buf_size, line_count)) {
> > return -ENOENT;
> > }
> >
> > /* Ensure the record is still finalized and has the same @seq. */
> > return desc_read_finalized_seq(desc_ring, id, seq, &desc);
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > |
> > `-> This includes a memory barrier /* LMM(desc_read:A) */
> > which makes sure that the data are read before
> > the desc/data could be reused.
> > }
> >
> > I consider this /* LMM(desc_read:A) */ as a counter part for that
> > acquire() in prb_next_seq().
>
> desc_read:A is not a memory barrier. It only marks the load of the
> descriptor state.

I see. The real read barriers are desc_read:B and desc_read:D

> This is a significant load because prb_next_seq() must
> see at least the descriptor state that desc_update_last_finalized() saw.
>
> The memory barrier comments in desc_update_last_finalized() state:
>
> * If desc_last_finalized_seq:A reads from
> * desc_update_last_finalized:A, then desc_read:A reads from
> * _prb_commit:B.
>
> This is referring to a slightly different situation than the example I
> used above because it is referencing where the descriptor state was
> stored (_prb_commit:B). The same general picture is valid:
>
> CPU0 CPU1
> ==== ====
> _prb_commit:B
> desc_update_last_finalized:A desc_last_finalized_seq:A
> desc_read:A
>
> desc_read:A is loding the descriptor state that _prb_commit:B stored.
>
> The extra note in the comment clarifies that _prb_commit:B could also be
> denoted as desc_read:A because desc_update_last_finalized() performs a
> read (i.e. must have seen) _prb_commit:B.
>
> * Note: _prb_commit:B and desc_update_last_finalized:A can be
> * different CPUs. However, the desc_update_last_finalized:A
> * CPU (which performs the release) must have previously seen
> * _prb_commit:B.
>
> Normally the CPU committing the record will also update
> last_finalized_seq. But it is possible that another CPU updates
> last_finalized_seq before the committing CPU because it already sees the
> finalized record. In that case the complete (maximally complex) picture
> looks like this.
>
> CPU0 CPU1 CPU2
> ==== ==== ====
> _prb_commit:B desc_read:A
> desc_update_last_finalized:A desc_last_finalized_seq:A
> desc_read:A
>
> Any memory barriers in _prb_commit() or desc_read() are irrelevant for
> guaranteeing that a CPU reading a sequence value from
> desc_last_finalized_seq() will always be able to read that record.

I believe that they are relevant exactly because 3 CPUs might be
involved here. I believe that

+ _prb_commit:B makes sure that CPU0 stores all data before
setting the state as finalized.

+ desc_read:B makes sure that CPU1 will see all data written
when it reads the finalized state.

+ desc_update_last_finalized:A makes sure that saw the record
with the given seq in finalized state.

+ desc_last_finalized_seq:A makes sure that it sees the record
associated with "last_finalized_seq" in the finalized state.

This is why the "Note:" is very important. And maybe desc_read:B
or desc_read:D should be mentioned in the note as well.

Also all these dependencies are really hard to follow. This
is why I suggested to add another note in the 9th patch,
see https://lore.kernel.org/all/ZbowlkOVWiSgCxNX@alley/

Or maybe we should document that pr_read() and _prb_read_valid()
provides these guarantees and just reference it here.

By another words. IMHO, the current "Note:" tries to
prove that _prb_read_valid() guarantees that all data
can be read when it sees the finalized state. IMHO,
we should document this above these functions and
just reference it here.

The current comments for desc_update_last_finalized:A
and prb_next_reserve_seq:A are really hard to follow
because they both try to explain these transitional
guarantees between prb_commit() and prb_read() APIs.
The comments mention many barriers even though the guarantees
should be there by design and should be mentioned
in the prb_read() API.

My motivation is that it took me long time to understand this.
And I am still not sure if I understand it correctly.
IMHO, it might be better to describe some guarantees of
upper level API so that we do not need to look at
the low level barriers again and again.

Best Regards,
Petr