RE: GCC, unaligned access and UB in the Linux kernel
From: David Laight
Date: Wed May 05 2021 - 06:41:54 EST
From: Vladislav K. Valtchev
> Sent: 04 May 2021 19:08
>
> Hi everyone,
>
> I noticed that in the Linux kernel we have a define called
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS that's used in a fair amount of places
> with the following purpose: when it's set, unaligned access is supported by the
> CPU so we can do it directly, otherwise fall-back to some logic where a byte at
> the time is read/written. For example, check the implementation of
> do_strncpy_from_user():
> https://elixir.bootlin.com/linux/latest/source/lib/strncpy_from_user.c#L15
>
>
> That's nice and seems to work today as expected, but there's one problem:
> unaligned access is UB according to the ISO C standard, ...
The C standard isn't relevant.
...
>
> Mitigations
> ------------
> In my understanding, the simplest way to force GCC to emit a single MOV
> instruction with unaligned access, without risking any kind of UB, is to use
> __builtin_memcpy(). It works great, but it requires fixing the code in many
> places. Also, maybe using get_unaligned()/put_unaligned() is the right thing to
> do? The problem is that the put_unaligned_* inline functions don't use
> __builtin_memcpy() and are defined like:
>
> static __always_inline void put_unaligned_le32(u32 val, void *p)
> {
> *((__le32 *)p) = cpu_to_le32(val);
> }
>
> So, still UB. To make the compiler happy, maybe we should make them use
> __builtin_memcpy()?
That doesn't help you at all.
If the compiler is given any hint that the address is aligned
it will use potentially misaligned memory accesses.
So if the above was:
*(int __attribute__((packed)) *)p = val;
and the caller had:
int *p = maybe_unaligned();
put_unaligned_le32(0, p);
Then gcc will generate a 32bit write even on sparc.
__builtin_memcpy() will expand to exactly the same 32bit write.
This is nothing new - at least 20 years.
Basically, the C standard only allows you to cast a pointer
to 'void *' and then back to its original type.
GCC makes use of this for various optimisations and will
track the alignment through (void *) casts.
I believe that, even for the simd instructions, gcc will pick
the 'misalign supporting' version if the data is marked
with the relevant alignment.
Oh - and the loop vectorisation code is a pile of crap.
You almost never want it - look at the code it generates
for a loop you go round a few times.
It you are doing 10000 iteractions you'll need to unroll
it yourself to get decent performance.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)