Re: [PATCH -next] power/qos: fix a data race in pm_qos_*_value

From: Rafael J. Wysocki
Date: Tue Feb 25 2020 - 19:10:53 EST


On Mon, Feb 24, 2020 at 8:02 PM Qian Cai <cai@xxxxxx> wrote:
>
> On Mon, 2020-02-24 at 10:54 +0100, Rafael J. Wysocki wrote:
> > On Mon, Feb 24, 2020 at 2:01 AM Qian Cai <cai@xxxxxx> wrote:
> > >
> > >
> > >
> > > > On Feb 23, 2020, at 7:12 PM, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > >
> > > > It may be a bug under certain conditions, but you don't mention what
> > > > conditions they are. Reporting it as a general bug is not accurate at
> > > > the very least.
> > >
> > > Could we rule out load tearing, store tearing and reload of global_req in cpuidle_governor_latency() for all compilers and architectures which could introduce logic bugs?
> > >
> > > int global_req = cpu_latency_qos_limit();
> > >
> > > if (device_req > global_req)
> > > device_req = global_req;
> > >
> > > If under register pressure, the compiler might get ride of the tmp variable, i.e.,
> > >
> > > If (device_req > cpu_latency_qos_limit())
> > > â-> race with the writer.
> > > device_req = cpu_latency_qos_limit();
> >
> > Yes, there is a race here with or without the WRITE_ONCE()/READ_ONCE()
> > annotations (note that these annotations don't prevent CPUs from
> > reordering things, so device_req may be set before global_req
> > regardless).
> >
> > However, worst-case it may cause an old value to be used and that can
> > happen anyway if the entire cpuidle_governor_latency_req() runs
> > between the curr_value update and pm_qos_set_value() in
> > pm_qos_update_target(), for example.
> >
> > IOW, there is no guarantee that the new value will be used immediately
> > after updating a QoS request anyway.
> >
> > I agree with adding the annotations (I was considering posting a patch
> > doing that myself), but just as a matter of making the intention
> > clear.
>
> OK, how about this updated texts?
>
> [PATCH -next] power/qos: annotate a data race in pm_qos_*_value
>
> cpu_latency_constraints.target_value could be accessed concurrently via,
>
> cpu_latency_qos_apply
> pm_qos_update_target
> pm_qos_set_value
>
> cpuidle_governor_latency_req
> cpu_latency_qos_limit
> pm_qos_read_value
>
> The read is outside pm_qos_lock critical section which results in a data race.
> However, the worst case is that an old value to be used and that can happen
> anyway, so annotate this data race using a pair of READ|WRITE_ONCE().

I would rather say something like this:

The target_value field in struct pm_qos_constraints is used for
lockless access to the effective constraint value of a given QoS list,
so the readers of it cannot expect it to always reflect the most
recent effective constraint value. However, they can and do expect it
to be equal to a valid effective constraint value computed at a
certain time in the past (event though it may not be the most recent
one), so add READ|WRITE_ONCE() annotations around the target_value
accesses to prevent the compiler from possibly causing that
expectation to be unmet by generating code in an exceptionally
convoluted way.