Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

From: Arnd Bergmann
Date: Mon Dec 16 2019 - 07:06:47 EST


On Mon, Dec 16, 2019 at 11:28 AM Will Deacon <will@xxxxxxxxxx> wrote:
> On Fri, Dec 13, 2019 at 02:17:08PM +0100, Arnd Bergmann wrote:
> > On Thu, Dec 12, 2019 at 9:50 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > On Thu, Dec 12, 2019 at 11:34 AM Will Deacon <will@xxxxxxxxxx> wrote:
> > > > The root of my concern in all of this, and what started me looking at it in
> > > > the first place, is the interaction with 'typeof()'. Inheriting 'volatile'
> > > > for a pointer means that local variables in macros declared using typeof()
> > > > suddenly start generating *hideous* code, particularly when pointless stack
> > > > spills get stackprotector all excited.
> > >
> > > Yeah, removing volatile can be a bit annoying.
> > >
> > > For the particular case of the bitops, though, it's not an issue.
> > > Since you know the type there, you can just cast it.
> > >
> > > And if we had the rule that READ_ONCE() was an arithmetic type, you could do
> > >
> > > typeof(0+(*p)) __var;
> > >
> > > since you might as well get the integer promotion anyway (on the
> > > non-volatile result).
> > >
> > > But that doesn't work with structures or unions, of course.
> > >
> > > I'm not entirely sure we have READ_ONCE() with a struct. I do know we
> > > have it with 64-bit entities on 32-bit machines, but that's ok with
> > > the "0+" trick.
> >
> > I'll have my randconfig builder look for instances, so far I found one,
> > see below. My feeling is that it would be better to enforce at least
> > the size being a 1/2/4/8, to avoid cases where someone thinks
> > the access is atomic, but it falls back on a memcpy.
>
> I've been using something similar built on compiletime_assert_atomic_type()
> and I spotted another instance in the xdp code (xskq_validate_desc()) which
> tries to READ_ONCE() on a 128-bit descriptor, although a /very/ quick read
> of the code suggests that this probably can't be concurrently modified if
> the ring indexes are synchronised properly.

That's the only other one I found. I have not checked how many are structs
that are the size of a normal u8/u16/u32/u64, or if there are types that
have a lower alignment than there size, such as a __u16[2] that might
span a page boundary.

> However, enabling this for 32-bit ARM is total carnage; as Linus mentioned,
> a whole bunch of code appears to be relying on atomic 64-bit access of
> READ_ONCE(); the perf ring buffer, io_uring, the scheduler, pm_runtime,
> cpuidle, ... :(
>
> Unfortunately, at least some of these *do* look like bugs, but I can't see
> how we can fix them, not least because the first two are user ABI afaict. It
> may also be that in practice we get 2x32-bit stores, and that works out fine
> when storing a 32-bit virtual address. I'm not sure what (if anything) the
> compiler guarantees in these cases.

Would it help if 32-bit architectures use atomic64_read() and atomic64_set()
to implement a 64-bit READ_ONCE()/WRITE_ONCE(), or would that make it
worse in other ways?

On mips32, riscv32 and some minor 32-bit architectures with SMP support
(xtensa, csky, hexagon, openrisc, parisc32, sparc32 and ppc32 AFAICT) this
ends up using a spinlock for GENERIC_ATOMIC64, but at least ARMv6+,
i586+ and most ARC should be fine.

(Side note: the ARMv7 implementation is suboptimimal for ARMv7VE+
if LPAE is disabled, I think we need to really add Kconfig options for
ARMv7VE and 32-bit ARMv8 improve this and things like integer divide).

Arnd