Re: [PATCH -tip v3 09/11] data_race: Avoid nested statement expression

From: Will Deacon
Date: Wed May 27 2020 - 03:22:58 EST


On Wed, May 27, 2020 at 01:10:00AM +0200, Arnd Bergmann wrote:
> On Tue, May 26, 2020 at 9:00 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >
> > On Tue, May 26, 2020 at 7:33 PM 'Marco Elver' via Clang Built Linux
> > <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
> > > On Tue, 26 May 2020, Marco Elver wrote:
> > > > On Tue, 26 May 2020 at 14:19, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > > > Note that an 'allyesconfig' selects KASAN and not KCSAN by default.
> > > > But I think that's not relevant, since KCSAN-specific code was removed
> > > > from ONCEs. In general though, it is entirely expected that we have a
> > > > bit longer compile times when we have the instrumentation passes
> > > > enabled.
> > > >
> > > > But as you pointed out, that's irrelevant, and the significant
> > > > overhead is from parsing and pre-processing. FWIW, we can probably
> > > > optimize Clang itself a bit:
> > > > https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667
> > >
> > > Found that optimizing __unqual_scalar_typeof makes a noticeable
> > > difference. We could use C11's _Generic if the compiler supports it (and
> > > all supported versions of Clang certainly do).
> > >
> > > Could you verify if the below patch improves compile-times for you? E.g.
> > > on fs/ocfs2/journal.c I was able to get ~40% compile-time speedup.
> >
> > Yes, that brings both the preprocessed size and the time to preprocess it
> > with clang-11 back to where it is in mainline, and close to the speed with
> > gcc-10 for this particular file.
> >
> > I also cross-checked with gcc-4.9 and gcc-10 and found that they do see
> > the same increase in the preprocessor output, but it makes little difference
> > for preprocessing performance on gcc.
>
> Just for reference, I've tested this against a patch I made that completely
> shortcuts READ_ONCE() on anything but alpha (which needs the
> read_barrier_depends()):
>
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -224,18 +224,21 @@ void ftrace_likely_update(struct
> ftrace_likely_data *f, int val,
> * atomicity or dependency ordering guarantees. Note that this may result
> * in tears!
> */
> -#define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x))
> +#define __READ_ONCE(x) (*(const volatile typeof(x) *)&(x))
>
> +#ifdef CONFIG_ALPHA /* smp_read_barrier_depends is a NOP otherwise */
> #define __READ_ONCE_SCALAR(x) \
> ({ \
> __unqual_scalar_typeof(x) __x = __READ_ONCE(x); \
> smp_read_barrier_depends(); \
> - (typeof(x))__x; \
> + __x; \
> })
> +#else
> +#define __READ_ONCE_SCALAR(x) __READ_ONCE(x)
> +#endif

Nice! FWIW, I'm planning to have Alpha override __READ_ONCE_SCALAR()
eventually, so that smp_read_barrier_depends() can disappear forever. I
just bit off more than I can chew for 5.8 :(

However, '__unqual_scalar_typeof()' is still useful for
load-acquire/store-release on arm64, so we still need a better solution to
the build-time regression imo. I'm not fond of picking random C11 features
to accomplish that, but I also don't have any better ideas...

Is there any mileage in the clever trick from Rasmus?

https://lore.kernel.org/r/6cbc8ae1-8eb1-a5a0-a584-2081fca1c4aa@xxxxxxxxxxxxxxxxxx

Will