Re: [PATCH] net/ncsi: Always use unicast source MAC address

From: Peter Delevoryas
Date: Thu Dec 15 2022 - 20:08:17 EST




> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <alexander.duyck@xxxxxxxxx> wrote:
>
> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
>> I use QEMU for development, and I noticed that NC-SI packets get dropped by
>> the Linux software bridge[1] because we use a broadcast source MAC address
>> for the first few NC-SI packets.
>
> Normally NC-SI packets should never be seen by a bridge.

True, and it’s good to keep this in context. I’m trying to make this change
to support simulation environments, but any change in NC-SI could easily
result in the out-of-band network connection to BMC’s in real data centers
failing to come up, which can be really bad and usually impossible to
recover remotely.

> Isn't NC-SI
> really supposed to just be between the BMC and the NIC firmware?

Yep

> Depending on your setup it might make more sense to use something like
> macvtap or a socket connection to just bypass the need for the bridge
> entirely.

For unicast, yes, but I want to test multiple NIC’s sharing an RMII
link and verifying the broadcast behavior, and the failover behavior
when an RX or TX channel goes down.

The multicast UDP socket backend _does_ work, but I was getting some
recirculation problems or some kind of buffering thing. I managed
to get tap0 + tap1 + br0 working faster.

>
>> The spec requires that the destination MAC address is FF:FF:FF:FF:FF:FF,
>> but it doesn't require anything about the source MAC address as far as I
>> know. From testing on a few different NC-SI NIC's (Broadcom 57502, Nvidia
>> CX4, CX6) I don't think it matters to the network card. I mean, Meta has
>> been using this in mass production with millions of BMC's [2].
>>
>> In general, I think it's probably just a good idea to use a unicast MAC.
>
> I'm not sure I agree there. What is the initial value of the address?

Ok so, to be honest, I thought that the BMC’s FTGMAC100 peripherals
came with addresses provisioned from the factory, and that we were just
discarding that value and using an address provisioned through the NIC,
because I hadn’t really dug into the FTGMAC100 datasheet fully. I see now
that the MAC address register I thought was a read-only manufacturing
value is actually 8 different MAC address r/w registers for filtering.
*facepalm*

It suddenly makes a lot more sense why all these OEM Get MAC Address
commands exist: the BMC chip doesn’t come with any MAC addresses from
manufacturing. It’s a necessity, not some convenience artifact/etc.

So, tracing some example systems to see what shows up:

One example:
INIT: Entering runlevel: 5
Configuring network interfaces... [ 25.893118] 8021q: adding VLAN 0 to HW filter on device eth0
[ 25.904809] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 25.917307] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
[ 25.958096] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 25.978124] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 25.990559] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
[ 26.018180] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.030631] ftgmac100 1e660000.ethernet eth0: NCSI: 'bad' packet ignored for type 0xd0
[ 26.046594] ftgmac100 1e660000.ethernet eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
[ 26.168109] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.198101] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.238237] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.272011] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.308155] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.320504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
done.
[ 26.408094] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.438100] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.450537] ftgmac100 1e660000.ethernet eth0: NCSI: bcm_gmac16 MAC RE:DA:CT:ED:HE:HE
Starting random number generator[ 26.472388] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
daemon[ 26.518241] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.559504] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
.
[ 26.608229] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
Setup dhclient for IPv6... done.
[ 26.681879] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.730523] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.808191] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff
[ 26.855689] NCSI: source=ff:ff:ff:ff:ff:ff dest=ff:ff:ff:ff:ff:ff

Oddly, due to that code you mentioned, all NC-SI packets are using
a broadcast source MAC address, even after the Get MAC Address sequence
gets the MAC provisioned for the BMC from the Broadcom NIC.

root@bmc-oob:~# ifconfig
eth0 Link encap:Ethernet HWaddr RE:DA:CT:ED:HE:HE
inet addr:XXXXXXX Bcast:XXXXXXXX Mask:XXXXXXXX
inet6 addr: XXXXXXXX Scope:Global
inet6 addr: XXXXXXXX Scope:Link
inet6 addr: XXXXXXXX Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2965 errors:0 dropped:0 overruns:0 frame:0
TX packets:637 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:872759 (852.3 KiB) TX bytes:59936 (58.5 KiB)
Interrupt:19

But, that’s a system using the 5.0 kernel with lots of old hacks
on top. A system using a 5.15 kernel with this change included:

INIT: Entering runlevel: 5
Configuring network interfaces... [ 6.596537] 8021q: adding VLAN 0 to HW filter on device eth0
[ 6.609264] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.622913] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
[ 6.641447] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.662543] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.680454] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.694114] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
[ 6.715722] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
done.
[ 6.741372] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.741451] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.768714] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
Starting random [ 6.782599] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
number generator[ 6.799321] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
daemon[ 6.815680] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.831388] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
.
[ 6.846921] ftgmac100 1e690000.ftgmac eth0: NCSI: Network controller supports NC-SI 1.1, querying MAC address through OEM(0x8119) command
Setup dhclient for IPv6... done.
[ 6.908921] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
reloading rsyslo[ 6.933085] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff

So, this BMC already had the provisioned MAC address somehow,
even before the Nvidia Get MAC Address command towards the bottom.

Adding tracing to ftgmac100:

[ 2.018672] ftgmac100_initial_mac
[ 2.026090] Read MAC address from FTGMAC100 register: RE:DA:CT:ED:AD:DR
[ 2.040771] ftgmac100 1e690000.ftgmac: Read MAC address RE:DA:CT:ED:AD:DR from chip
[ 2.057774] ftgmac100 1e690000.ftgmac: Using NCSI interface
[ 2.070957] ftgmac100 1e690000.ftgmac eth0: irq 33, mapped at (ptrval)

Now, after rewriting the FTGMAC100 register to fa:ce:b0:0c:20:22 and rebooting:

root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008 32 0x0000face
root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e690008
0x0000FACE
root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c 32 0xb00c2022
root@dhcp6-2620-10d-c0b9-4b08-0-0-0-e4e:~# devmem 0x1e69000c
0xB00C2022

[ 2.001304] ftgmac100_initial_mac
[ 2.008727] Read MAC address from FTGMAC100 register: fa:ce:b0:0c:20:22
[ 2.023373] ftgmac100 1e690000.ftgmac: Read MAC address fa:ce:b0:0c:20:22 from chip
[ 2.040367] ftgmac100 1e690000.ftgmac: Using NCSI interface

[ 6.581239] ftgmac100_reset_mac
[ 6.589193] ftgmac100_reset_mac
[ 6.596727] 8021q: adding VLAN 0 to HW filter on device eth0
[ 6.609462] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
[ 6.623117] ftgmac100 1e690000.ftgmac eth0: NCSI: Handler for packet type 0x82 returned -19
[ 6.641647] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
[ 6.662398] NCSI: source=fa:ce:b0:0c:20:22 dest=ff:ff:ff:ff:ff:ff
[ 6.680380] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.694000] ftgmac100 1e690000.ftgmac eth0: NCSI: transmit cmd 0x50 ncsi_oem_sma_mlx success during probe
[ 6.715700] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff
[ 6.729528] NCSI: source=RE:DA:CT:ED:AD:DR dest=ff:ff:ff:ff:ff:ff

So, it looks like whatever is initialized in ftgmac100_initial_mac becomes
the address we use for the NCSI queries initially.

The Aspeed datasheet says the FTGMAC100 MAC address registers are initialized to zero,
and in that case the ftgmac100 driver initializes it to something random
with eth_hw_addr_random().

So, I mean correct me if I’m wrong, but I think it all seems fine?

On a hard power cycle (instead of just resetting the ARM cores, which doesn’t seem to
have reset the peripherals), maybe it would actually be zero, and get initialized
to the random value. I’ll test that, need to do some more debug image building to do it
remotely.

> If I am not mistaken the gma_flag is used to indicate that the MAC
> address has been acquired isn't it?

That’s correct.

> If using the broadcast is an issue
> the maybe an all 0's MAC address might be more appropriate.

Possibly yeah, although that would also be dropped by the Linux bridge lol,
so it wouldn’t solve my simulation problem.

> My main
> concern would be that the dev_addr is not initialized for those first
> few messages so you may be leaking information.
>
>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>
>> [1]: https://tinyurl.com/4933mhaj
>> [2]: https://tinyurl.com/mr3tyadb
>
> The thing is the OpenBMC approach initializes the value themselves to
> broadcast[3]. As a result the two code bases are essentially doing the
> same thing since mac_addr is defaulted to the broadcast address when
> the ncsi interface is registered.

That’s a very good point, thanks for pointing that out, I hadn’t
even noticed that!

Anyways, let me know what you think of the traces I added above.
Sorry for the delay, I’ve just been busy with some other stuff,
but I do really actually care about upstreaming this (and several
other NC-SI changes I’ll submit after this one, which are unrelated
but more useful).

Thanks,
Peter

>
> [3]: tinyurl.com/mr3cxf3b
>
>>
>> Signed-off-by: Peter Delevoryas <peter@xxxxxxx>
>> ---
>> net/ncsi/ncsi-cmd.c | 10 +---------
>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
>> index dda8b76b7798..fd090156cf0d 100644
>> --- a/net/ncsi/ncsi-cmd.c
>> +++ b/net/ncsi/ncsi-cmd.c
>> @@ -377,15 +377,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
>> eh = skb_push(nr->cmd, sizeof(*eh));
>> eh->h_proto = htons(ETH_P_NCSI);
>> eth_broadcast_addr(eh->h_dest);
>> -
>> - /* If mac address received from device then use it for
>> - * source address as unicast address else use broadcast
>> - * address as source address
>> - */
>> - if (nca->ndp->gma_flag == 1)
>> - memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>> - else
>> - eth_broadcast_addr(eh->h_source);
>> + memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>>
>> /* Start the timer for the request that might not have
>> * corresponding response. Given NCSI is an internal
>