RE: [PATCH] bonding:avoid repeated display of same link status change

From: Manish Kumar Singh
Date: Fri Oct 26 2018 - 02:50:14 EST




> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@xxxxxxx]
> Sent: 25 ààààààà 2018 14:59
> To: Manish Kumar Singh
> Cc: Eric Dumazet; Mahesh Bandewar (àààà ààààààà); linux-netdev; Jay
> Vosburgh; Veaceslav Falico; Andy Gospodarek; David S. Miller; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] bonding:avoid repeated display of same link status
> change
>
> On Thu, Oct 25, 2018 at 02:21:05AM -0700, Manish Kumar Singh wrote:
> > > From: Michal Kubecek [mailto:mkubecek@xxxxxxx]
> > > IMHO it does not. AFAICS multiple instances of bond_mii_monitor()
> cannot
> > > run simultaneously for the same bond so that there doesn't seem to be
> > > anything to collide with. (And if they could, we would need to test and
> > > set the flag atomically in bond_miimon_inspect().)
> > >
> > Yes, Michal, we are inline with your understanding.
> > when the -original- patch was posted to upstream there was no
> > -synchronization- nor -racing- addressing code was in read/write of this
> > added filed, as we -never- saw need for either.
> >
> > -only- writer of the added field is bond_mii_monitor.
> > -only- reader of the added field is bond_miimon_inspect.
> > -this writer & reader -never- can run concurrently.
> > -writer invokes the reader.
> >
> > hence, imo uint_8 rtnl_needed is all what is needed; with
> bond_mii_monitor doing rtnl_needed = 1; and bond_miimon_inspect doing
> if rtnl_needed.
> >
> > here is the gravity of the situation with multiple customers whose names
> including machine names redacted:
> >
> > 4353 May 31 02:38:57 hostname kernel: ixgbe 0000:03:00.0: removed PHC
> on p2p1
> > 4354 May 31 02:38:57 hostname kernel: public: link status down for active
> interface p2p1, disabling it in 100 ms
> > 4355 May 31 02:38:57 hostname kernel: public: link status down for active
> interface p2p1, disabling it in 100 ms
> > 4356 May 31 02:38:57 hostname kernel: public: link status definitely down
> for interface p2p1, disabling it
> > 4357 May 31 02:38:57 hostname kernel: public: making interface p2p2 the
> new active one
> > 4358 May 31 02:38:59 hostname kernel: ixgbe 0000:03:00.0: registered PHC
> device on p2p1
> > 4359 May 31 02:39:00 hostname kernel: ixgbe 0000:03:00.0 p2p1: NIC Link is
> Up 10 Gbps, Flow Control: RX/TX
> > 4360 May 31 02:39:00 hostname kernel: public: link status up for interface
> p2p1, enabling it in 200 ms
> > 4361 May 31 02:39:00 hostname kernel: public: link status definitely up for
> interface p2p1, 10000 Mbps full duplex
> > 4362 May 31 02:45:37 hostname journal: Missed 217723 kernel messages
> > 4363 May 31 02:45:37 hostname kernel: public: link status down for active
> interface p2p2, disabling it in 100 ms
> > ---------------------
> > 11000+ APPROX SAME REPEATED MESSAGES in second
> > ---------------------
> > 15877 May 31 02:45:37 hostname kernel: public: link status down for active
> interface p2p2, disabling it in 100 ms
> > 15878 May 31 02:45:37 hostname kernel: public: link status definitely down
> for interface p2p2, disabling it
> > 15879 May 31 02:45:37 hostname kernel: public: making interface p2p1 the
> new active one
>
> When I was replying, I didn't know this was a v2 and I haven't seen the
> v1 discussion. I have read it since and I think I understand Eric's
> point now. The thing is that just adding e.g. u8 is OK as it is now.
> However, someone could later add another u8 next to it which would also
> be perfectly OK on its own but reads/writes to these two could collide
> between each other.
>
> And as pointed out by a colleague, even having atomic_t and u8 flag in
> one 64-bit word could be a problem on architectures which cannot do an
> atomic read/write from/to a 32-bit word (sparc seems to be one).

Thanks Michal for explaining it, now we understand the problem what Eric was referring to in v1 of the patch.
I could think of fixing it in 3 ways, Please suggest which one would be safe and optimal fix:

1. Use type unit64_t for rtnl_needed .
2. Use type atomic64_t for rtnl_needed and atomic64_set/read.
3. Use type uint64_t for rtnl_needed with spinlock protection.

I think option 3 would be overkill keeping in mind the frequency of bond_mii_monitor.

Thanks,
Manish
>
> Michal Kubecek