Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
From: Joel Fernandes
Date: Mon Mar 04 2024 - 14:10:26 EST
On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
>>
>>
>> On 3/4/2024 5:54 AM, linke li wrote:
>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
>>>
>>> {
>>> int i;
>>>
>>> - i = rp->rtort_pipe_count;
>>> + i = READ_ONCE(rp->rtort_pipe_count);
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>> rp->rtort_mbtest = 0;
>>> return true;
>>> }
>>>
>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
>>> rcu_torture_writer() concurrently.
>>>
>>> Also, rp->rtort_pipe_count in the next line should be read using
>>> READ_ONCE() because of data race.
>>>
>>> Signed-off-by: linke li <lilinke99@xxxxxx>
>>> ---
>>> kernel/rcu/rcutorture.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
>>> index 7567ca8e743c..00059ace4fd5 100644
>>> --- a/kernel/rcu/rcutorture.c
>>> +++ b/kernel/rcu/rcutorture.c
>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
>>
>> I want to say, I am not convinced with the patch because what's wrong with
>> writing to an old index?
>>
>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
>> the "incremented i" or "the increment from the latest value of pipe_count".
>>
>> Anyway, a slightly related/different question:
>>
>> Should that:
>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>
>> Be:
>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>>
>> ?
>
> Thank you both!
>
> At first glance, I would argue for something like this:
>
> ------------------------------------------------------------------------
>
> static bool
> rcu_torture_pipe_update_one(struct rcu_torture *rp)
> {
> int i;
> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>
> if (rtrcp) {
> WRITE_ONCE(rp->rtort_chkp, NULL);
> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> }
> i = READ_ONCE(rp->rtort_pipe_count) + 1;
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> WRITE_ONCE(rp->rtort_pipe_count, i);
> if (i >= RCU_TORTURE_PIPE_LEN) {
> rp->rtort_mbtest = 0;
> return true;
> }
> return false;
> }
>
> ------------------------------------------------------------------------
>
> That is, move the increment to the read and replace the re-read with
> the value "i" that was just written.
But that changes the original logic as well? It looks like with the above
change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
+ 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
I think that might break rcutorture, because there is an increment outside of
the first 2 entries in rcu_torture_wcount but not sure (need to look more).
Thanks.