RE: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays

From: Jubran, Samih
Date: Wed Mar 11 2020 - 09:25:00 EST


Hi Josh,

Thanks for taking the time to write this patch. I have faced a bug while testing it that I haven't pinpointed yet the root cause of the issue, but it seems to me like a race in the netlink infrastructure.

Here is the bug scenario:
1. created ac c5.24xlarge instance in AWS in v_virginia region using the default amazon Linux 2 AMI
2. apply your patch won top of net-next v5.2 and install the kernel (currently I'm able to boot net-next v5.2 only, higher versions of net-next suffer from errors during boot time)
3. run "rmmod ena && insmod ena.ko" twice

Result:
The interface is not in up state

Expected result:
The interface should be in up state

What I know so far:
* ena_probe() seems to finish with no errors whatsoever
* adding prints / delays to ena_probe() causes the bug to vanish or less likely to occur depending on the amount of delays I add
* ena_up() is not called at all when the bug occurs, so it's something to do with netlink not invoking dev_open()

Did you face such issues? Do you have any idea what might be causing this?

> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Machulsky, Zorik
> <zorik@xxxxxxxxxx>
> Sent: Tuesday, March 3, 2020 2:54 AM
> To: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Cc: Belgazal, Netanel <netanel@xxxxxxxxxx>; Kiyanovski, Arthur
> <akiyano@xxxxxxxxxx>; Tzalik, Guy <gtzalik@xxxxxxxxxx>; Bshara, Saeed
> <saeedb@xxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays
>
>
>
> ïOn 3/2/20, 4:40 PM, "Josh Triplett" <josh@xxxxxxxxxxxxxxxx> wrote:
>
>
> On Mon, Mar 02, 2020 at 11:16:32PM +0000, Machulsky, Zorik wrote:
> >
> > On 2/28/20, 4:29 PM, "Josh Triplett" <josh@xxxxxxxxxxxxxxxx> wrote:
> >
> > Before initializing completion queue interrupts, the ena driver uses
> > polling to wait for responses on the admin command queue. The ena
> driver
> > waits 5ms between polls, but the hardware has generally finished long
> > before that. Reduce the poll time to 10us.
> >
> > On a c5.12xlarge, this improves ena initialization time from 173.6ms to
> > 1.920ms, an improvement of more than 90x. This improves server boot
> time
> > and time to network bringup.
> >
> > Thanks Josh,
> > We agree that polling rate should be increased, but prefer not to do it
> aggressively and blindly.
> > For example linear backoff approach might be a better choice. Please let
> us re-work a little this
> > patch and bring it to review. Thanks!
>
> That's fine, as long as it has the same net improvement on boot time.
>
> I'd appreciate the opportunity to test any alternate approach you might
> have.
>
> (Also, as long as you're working on this, you might wish to make a
> similar change to the EFA driver, and to the FreeBSD drivers.)
>
> Absolutely! Already forwarded this to the owners of these drivers. Thanks!
>
> > Before:
> > [ 0.531722] calling ena_init+0x0/0x63 @ 1
> > [ 0.531722] ena: Elastic Network Adapter (ENA) v2.1.0K
> > [ 0.531751] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.1.0K
> > [ 0.531946] PCI Interrupt Link [LNKD] enabled at IRQ 11
> > [ 0.547425] ena: ena device version: 0.10
> > [ 0.547427] ena: ena controller version: 0.0.1 implementation version
> 1
> > [ 0.709497] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at
> mem febf4000, mac addr 06:c4:22:0e:dc:da, Placement policy: Low Latency
> > [ 0.709508] initcall ena_init+0x0/0x63 returned 0 after 173616 usecs
> >
> > After:
> > [ 0.526965] calling ena_init+0x0/0x63 @ 1
> > [ 0.526966] ena: Elastic Network Adapter (ENA) v2.1.0K
> > [ 0.527056] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.1.0K
> > [ 0.527196] PCI Interrupt Link [LNKD] enabled at IRQ 11
> > [ 0.527211] ena: ena device version: 0.10
> > [ 0.527212] ena: ena controller version: 0.0.1 implementation version
> 1
> > [ 0.528925] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at
> mem febf4000, mac addr 06:c4:22:0e:dc:da, Placement policy: Low Latency
> > [ 0.528934] initcall ena_init+0x0/0x63 returned 0 after 1920 usecs
>