RE: [Intel-wired-lan] [PATCH iwl-next v5] igb: Retrieve Tx timestamp from BH workqueue
From: Rinitha, SX
Date: Tue Apr 21 2026 - 02:39:10 EST
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@xxxxxxxxxx> On Behalf Of Kurt Kanzenbach
> Sent: 05 March 2026 15:56
> To: Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>; Kitszel, Przemyslaw <przemyslaw.kitszel@xxxxxxxxx>
> Cc: Paul Menzel <pmenzel@xxxxxxxxxxxxx>; Vadim Fedorenko <vadim.fedorenko@xxxxxxxxx>; Gomes, Vinicius <vinicius.gomes@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; Richard Cochran <richardcochran@xxxxxxxxx>; Kurt Kanzenbach <kurt@xxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Loktionov, Aleksandr <aleksandr.loktionov@xxxxxxxxx>; Andrew Lunn <andrew+netdev@xxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; intel-wired-lan@xxxxxxxxxxxxxxxx; Keller, Jacob E <jacob.e.keller@xxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Subject: [Intel-wired-lan] [PATCH iwl-next v5] igb: Retrieve Tx timestamp from BH workqueue
>
> Retrieve Tx timestamp from system BH instead of regular system workqueue.
>
> The current implementation uses schedule_work() which is executed by the system work queue and kworkers to retrieve Tx timestamps. This increases latency and can lead to timeouts in case of heavy system load. i210 is often used in industrial systems, where timestamp timeouts can be fatal.
>
> Therefore, switch to the system BH workqueues which are executed in softirq context shortly after the IRQ handler returns.
>
>Tested between Intel i210 and i350 with ptp4l gPTP profile:
>
> |ptp4l[30.405]: rms 4 max 7 freq +12825 +/- 3 delay 247 +/- 0
> |ptp4l[31.406]: rms 2 max 3 freq +12829 +/- 3 delay 248 +/- 0
>|ptp4l[32.406]: rms 3 max 3 freq +12827 +/- 3 delay 248 +/- 0
> |ptp4l[33.406]: rms 2 max 3 freq +12827 +/- 3 delay 248 +/- 0
> |ptp4l[34.407]: rms 3 max 6 freq +12825 +/- 4 delay 248 +/- 0
> |ptp4l[35.407]: rms 3 max 6 freq +12822 +/- 4 delay 246 +/- 0
> |ptp4l[36.407]: rms 7 max 10 freq +12812 +/- 5 delay 248 +/- 0
> |ptp4l[37.408]: rms 5 max 8 freq +12808 +/- 3 delay 248 +/- 0
>
> Furthermore, Miroslav Lichvar tested with ntpperf and chrony on Intel i350:
>
> Without the patch:
>
> | | responses | response time (ns)
> |rate clients | lost invalid basic xleave | min mean max stddev
> |150000 15000 0.00% 0.00% 0.00% 100.00% +4188 +36475 +193328 16179
> |157500 15750 0.02% 0.00% 0.02% 99.96% +6373 +42969 +683894 22682
> |165375 16384 0.03% 0.00% 0.00% 99.97% +7911 +43960 +692471 24454
> |173643 16384 0.06% 0.00% 0.00% 99.94% +8323 +45627 +707240 28452
> |182325 16384 0.06% 0.00% 0.00% 99.94% +8404 +47292 +722524 26936
> |191441 16384 0.00% 0.00% 0.00% 100.00% +8930 +51738 +223727 14272
> |201013 16384 0.05% 0.00% 0.00% 99.95% +9634 +53696 +776445 23783
> |211063 16384 0.00% 0.00% 0.00% 100.00% +14393 +54558 +329546 20473
> |221616 16384 2.59% 0.00% 0.05% 97.36% +23924 +321205 +518192 21838
> |232696 16384 7.00% 0.00% 0.10% 92.90% +33396 +337709 +575661 21017
> |244330 16384 10.82% 0.00% 0.15% 89.03% +34188 +340248 +556237 20880
> |
> |With the patch:
> |150000 15000 5.11% 0.00% 0.00% 94.88% +4426 +460642 +640884 83746
> |157500 15750 11.54% 0.00% 0.26% 88.20% +14434 +543656 +738355 30349
> |165375 16384 15.61% 0.00% 0.31% 84.08% +35822 +515304 +833859 25596
> |173643 16384 19.58% 0.00% 0.37% 80.05% +20762 +568962 +900100 28118
> |182325 16384 23.46% 0.00% 0.42% 76.13% +41829 +547974 +804170 27890
> |191441 16384 27.23% 0.00% 0.46% 72.31% +15182 +557920 +798212 28868
> |201013 16384 30.51% 0.00% 0.49% 69.00% +15980 +560764 +805576 29979
> |211063 16384 0.06% 0.00% 0.00% 99.94% +12668 +80487 +410555 62182
> |221616 16384 2.94% 0.00% 0.05% 97.00% +21587 +342769 +517566 23359
> |232696 16384 6.94% 0.00% 0.10% 92.96% +16581 +336068 +484574 18453
> |244330 16384 11.45% 0.00% 0.14% 88.41% +23608 +345023 +564130 19177
>
> There are some minor differences at lower rates, but no performance regressions at higher ones.
>
> Reviewed-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@xxxxxxxxx>
> Signed-off-by: Kurt Kanzenbach <kurt@xxxxxxxxxxxxx>
> ---
> Changes in v5:
> - Adjust changelog wording (Aleksandr Loktionov)
> - Include measurement numbers in changelog (Paul Menzel)
> - Link to v4: https://patch.msgid.link/20260303-igb_irq_ts-v4-1-cbae7f127061@xxxxxxxxxxxxx
>
> Changes in v4:
> - Use BH workqueue (tasklet) instead of doing timestamping in IRQ path (Jakub Kicinski)
> - Link to v3: https://patch.msgid.link/20260205-igb_irq_ts-v3-1-2efc7bc4b885@xxxxxxxxxxxxx
>
> Changes in v3:
> - Switch back to IRQ, but for i210 only
> - Keep kworker for all other NICs like i350 (Miroslav)
> - Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1-1ac37078a7a4@xxxxxxxxxxxxx
>
> Changes in v2:
> - Switch from IRQ to PTP aux worker due to NTP performance regression (Miroslav)
> - Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1-8c6fc0353422@xxxxxxxxxxxxx
> ---
> drivers/net/ethernet/intel/igb/igb_main.c | 4 ++-- drivers/net/ethernet/intel/igb/igb_ptp.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
Tested-by: Rinitha S <sx.rinitha@xxxxxxxxx> (A Contingent worker at Intel)