Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
From: Mathieu Desnoyers
Date: Sat Aug 17 2019 - 10:40:55 EST
----- On Aug 16, 2019, at 10:13 PM, rostedt rostedt@xxxxxxxxxxx wrote:
> On Fri, 16 Aug 2019 21:36:49 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> ----- On Aug 16, 2019, at 5:04 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx
>> > On Fri, Aug 16, 2019 at 1:49 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> >> Can we finally put a foot down and tell compiler and standard committee
>> >> people to stop this insanity?
>> > It's already effectively done.
>> > Yes, values can be read from memory multiple times if they need
>> > reloading. So "READ_ONCE()" when the value can change is a damn good
>> > idea.
>> > But it should only be used if the value *can* change. Inside a locked
>> > region it is actively pointless and misleading.
>> > Similarly, WRITE_ONCE() should only be used if you have a _reason_ for
>> > using it (notably if you're not holding a lock).
>> > If people use READ_ONCE/WRITE_ONCE when there are locks that prevent
>> > the values from changing, they are only making the code illegible.
>> > Don't do it.
>> I agree with your argument in the case where both read-side and write-side
>> are protected by the same lock: READ/WRITE_ONCE are useless then. However,
>> in the scenario we have here, only the write-side is protected by the lock
>> against concurrent updates, but reads don't take any lock.
> And because reads are not protected by any lock or memory barrier,
> using READ_ONCE() is pointless. The CPU could be doing a lot of hidden
> manipulation of that variable too.
Please enlighten me by providing some details on what the CPU could do to
this word-aligned, word-sized variable in the absence of lock and barrier
that is relevant to this specific use-case ?
I suspect most of the barriers you refer to here are taken care of by the
tracepoint code which uses RCU to synchronize probe registration wrt
probe callback execution.
> Again, this is just to enable caching of the comm. Nothing more. It's a
> "best effort" algorithm. We get it, we then can map a pid to a name. If
> not, we just have a pid and we map "<...>".
> Next you'll be asking for the memory barriers to guarantee a real hit.
> And honestly, this information is not worth any overhead.
No, that's not my intent to add overhead to guarantee trace data
availability near trace beginning and end. However, considering that
READ_ONCE() and WRITE_ONCE() can provide additional data availability
guarantees in the middle of traces at no runtime cost, it seems like a
good trade off.
It's easier for an analysis to disregard missing information at the
beginning and end of trace without generating false-positive reports
than when it happens spuriously in the middle of traces.
> And most often we enable this before we enable the tracepoint we want
> this information from, which requires modification of the text area and
> will do a bunch of syncs across CPUs. That alone will most likely keep
> any race from happening.
Indeed the tracepoint use of RCU to synchronize registration vs probes
should take care of those barriers.
> The only real bug is the check to know if we should add the probe or
> not was done outside the lock, and when we hit the race, we could add a
> probe twice, causing the kernel to spit out a warning. You fixed that,
> and that's all that needs to be done.
I just sent that fix in a separate patch.
> I'm now even more against adding the READ_ONCE() or WRITE_ONCE().
I'm not convinced by your arguments.
> -- Steve
>> If WRITE_ONCE has any use at all (protecting against store tearing and
>> invented stores), it should be used even with a lock held in this
>> scenario, because the lock does not prevent READ_ONCE() from observing
>> transient states caused by lack of WRITE_ONCE() for the update.
>> So why does WRITE_ONCE exist in the first place ? Is it for documentation
>> purposes only or are there actual situations where omitting it can cause
>> bugs with real-life compilers ?
>> In terms of code change, should we favor only introducing WRITE_ONCE
>> in new code, or should existing code matching those conditions be
>> moved to WRITE_ONCE as bug fixes ?