RE: [PATCH 4.19 36/55] drivers/net/b44: Change to non-atomic bit operations on pwol_mask

From: David Laight
Date: Fri Jan 31 2020 - 08:45:48 EST


From: Pavel Machek
> Sent: 31 January 2020 12:58
>
> On Thu 2020-01-30 19:39:17, Greg Kroah-Hartman wrote:
> > From: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> >
> > [ Upstream commit f11421ba4af706cb4f5703de34fa77fba8472776 ]
>
> This is not suitable for stable. It does not fix anything. It prepares
> for theoretical bug that author claims might be introduced to BIOS in
> future... I doubt it, even BIOS authors boot their machines from time
> to time.
>
> > Atomic operations that span cache lines are super-expensive on x86
> > (not just to the current processor, but also to other processes as all
> > memory operations are blocked until the operation completes). Upcoming
> > x86 processors have a switch to cause such operations to generate a #AC
> > trap. It is expected that some real time systems will enable this mode
> > in BIOS.
>
> And I wonder if this is even good idea for mainline. x86 architecture
> is here for long time, and I doubt Intel is going to break it like
> this. Do you have documentation pointer?

The fact that locked operations that cross cache line boundaries work
at all is because of compatibility with very old processors (which
always locked the bus).

> > In preparation for this, it is necessary to fix code that may execute
> > atomic instructions with operands that cross cachelines because the #AC
> > trap will crash the kernel.
>
> How does single bit operation "cross cacheline"? How is this going to
> impact non-x86 architectures?

The cpu 'bit' instructions used always access a full 'word' of memory
at a 'word' offset from the specified base address'
With a 64bit bit offset the 'word' is 64 bits, so if the base address
of the array isn't 8 byte aligned the cpu does a misaligned RMW cycle.

Non-x86 architectures probably either:
1) Fault on the mis-aligned transfer.
2) Ignore the 'lock'.
3) Use a software 'array of mutex' to emulate locked bit updates.
4) Any random combination of the above.

> > Since "pwol_mask" is local and never exposed to concurrency, there is
> > no need to set bits in pwol_mask using atomic operations.
> >
> > Directly operate on the byte which contains the bit instead of using
> > __set_bit() to avoid any big endian concern due to type cast to
> > unsigned long in __set_bit().
>
> What concerns? Is __set_bit() now useless and are we going to open-code
> it everywhere? Is set_bit() now unusable on x86?

Both set_bit() and __set_bit() are defined to work on bitmaps
that are defined as 'long[]'.
They are not there because people are too lazy to write foo |= 1 << n.

...
> > memset(ppattern + offset, 0xff, magicsync);
> > - for (j = 0; j < magicsync; j++)
> > - set_bit(len++, (unsigned long *) pmask);
> > + for (j = 0; j < magicsync; j++) {
> > + pmask[len >> 3] |= BIT(len & 7);
> > + len++;
> > + }
> >
> > for (j = 0; j < B44_MAX_PATTERNS; j++) {
> > if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
> > @@ -1532,7 +1534,8 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
> > for (k = 0; k< ethaddr_bytes; k++) {
> > ppattern[offset + magicsync +
> > (j * ETH_ALEN) + k] = macaddr[k];
> > - set_bit(len++, (unsigned long *) pmask);
> > + pmask[len >> 3] |= BIT(len & 7);

In this case I believe the pmask[] is passed to hardware.
It is very much an array of bytes initialised in a specific way.

set_bit() (and __set_bit()) would do completely the wrong thing
on BE systems.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)