Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

From: Mathieu Desnoyers
Date: Thu Mar 07 2024 - 14:54:52 EST

Next message: Matthew Wilcox: "On the optimum size of a batch"
Previous message: Uladzislau Rezki: "Re: [PATCH v7 1/2] mm/vmalloc: Moved macros with no functional change happened"
In reply to: Paul E. McKenney: "Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug"
Next in thread: Paul E. McKenney: "Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024-03-07 14:47, Paul E. McKenney wrote:

On Thu, Mar 07, 2024 at 08:53:05AM -0500, Mathieu Desnoyers wrote:

On 2024-03-06 22:37, Paul E. McKenney wrote:

On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:

[...]

As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
is concerned, the only valid use-case I can think of is
split counters or RCU implementations where there is a
single updater doing the increment, and one or more
concurrent reader threads that need to snapshot a
consistent value with READ_ONCE().

[...]

So what would you use that pattern for?

One possibility is a per-CPU statistical counter in userspace on a
fastpath, in cases where losing the occasional count is OK. Then learning
your CPU (and possibly being immediately migrated to some other CPU),
READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
make sense.

I suppose the same in the kernel if there was a fastpath so extreme you
could not afford to disable preemption.

At best, very niche.

Or am I suffering a failure of imagination yet again? ;-)

The (niche) use-cases I have in mind are split-counters and RCU
grace period tracking, where precise counters totals are needed
(no lost count).

In the kernel, this could be:

Thank you for looking into this!

- A per-cpu counter, each counter incremented from thread context with
preemption disabled (single updater per counter), read concurrently by
other threads. WRITE_ONCE/READ_ONCE is useful to make sure there
is no store/load tearing there. Atomics on the update would be stronger
than necessary on the increment fast-path.

But if preemption is disabled, the updater can read the value without
READ_ONCE() without risk of concurrent update. Or are you concerned about
interrupt handlers? This would have to be a read from the interrupt
handler, given that an updated from the interrupt handler could result
in a lost count.

You are correct that the updater don't need READ_ONCE there. It would
however require a WRITE_ONCE() to match READ_ONCE() from concurrent
reader threads.

- A per-thread counter (e.g. within task_struct), only incremented by the
single thread, read by various threads concurrently.

Ditto.

Right, only WRITE_ONCE() on the single updater, READ_ONCE() on readers.

- A counter which increment happens to be already protected by a lock, read
by various threads without taking the lock. (technically doable, but
I'm not sure I see a relevant use-case for it)

In that case, the lock would exclude concurrent updates, so the lock
holder would not need READ_ONCE(), correct?

Correct.

In user-space:

- The "per-cpu" counter would have to use rseq for increments to prevent
inopportune migrations, which needs to be implemented in assembler anyway.
The counter reads would have to use READ_ONCE().

Fair enough!

- The per-thread counter (Thread-Local Storage) incremented by a single
thread, read by various threads concurrently, is a good target
for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
various liburcu implementations which track read-side critical sections
per-thread.

Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
similar?

Not quite, I recall it's more like WRITE_ONCE(x, READ_ONCE(y)) or such,
so we can grab the value of the current gp counter and store it into a
TLS variable.

- Same as for the kernel, a counter increment protected by a lock which
needs to be read from various threads concurrently without taking
the lock could be a valid use-case, though I fail to see how it is
useful due to lack of imagination on my part. ;-)

In RCU, we have "WRITE_ONCE(*sp, *sp + 1)" for this use case, though
here we have the WRITE_ONCE() but not the READ_ONCE() because we hold
the lock excluding any other updates.

You are right, the READ_ONCE() is not needed in this case for the
updater, only for the concurrent readers.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Next message: Matthew Wilcox: "On the optimum size of a batch"
Previous message: Uladzislau Rezki: "Re: [PATCH v7 1/2] mm/vmalloc: Moved macros with no functional change happened"
In reply to: Paul E. McKenney: "Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug"
Next in thread: Paul E. McKenney: "Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]