ice driver not loading with 256 CPU's?

From: Holger Kiehl
Date: Sat Oct 15 2022 - 08:20:25 EST


Hello,

I have an AMD system with 2 sockets (each with a EPYC 7763 64-Core)
with a total of 256 CPU's and a 4 port Intel 810 nic and get the
following error during boot:

Oct 15 10:53:35 hermes kernel: ice 0000:e2:00.1: The DDP package was successfully loaded: ICE OS Default Package version 1.3.26.0
Oct 15 10:53:35 hermes kernel: ice 0000:e2:00.1: not enough device MSI-X vectors. requested = 260, available = 252
Oct 15 10:53:35 hermes kernel: ice 0000:e2:00.1: ice_init_interrupt_scheme failed: -34
Oct 15 10:53:35 hermes kernel: ice: probe of 0000:e2:00.1 failed with error -5

Get this error when using default kernel from Alma9 or as above with
kernel.org 6.0.2 kernel. Looking at the code
(drivers/net/ethernet/intel/ice/ice_main.c ice_ena_msix_range() starting
at line 3928) I would assume if I had less CPU's this would not be a problem.

Any idea how I can get this working?

Thanks,
Holger


e2:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for SFP (rev 02)
Subsystem: Intel Corporation Ethernet 25G 4P E810-XXV Adapter
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 226
NUMA node: 1
IOMMU group: 136
Region 0: Memory at 4fea8000000 (64-bit, prefetchable) [size=32M]
Region 3: Memory at 4feaa030000 (64-bit, prefetchable) [size=64K]
Expansion ROM at b7100000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable- Count=512 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00008000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] Vital Product Data
Product Name: E810-XXV 25GbE Controller
Read-only fields:
[V0] Vendor specific: FFV20.5.13\x00
[PN] Part number: VK88G
[MN] Manufacture ID: 1028
[V1] Vendor specific: DSV1028VPDR.VER2.2
[V3] Vendor specific: DTINIC
[V4] Vendor specific: DCM1001FFFFFF2101FFFFFF3201FFFFFF4301FFFFFF
[V5] Vendor specific: NPY4
[V6] Vendor specific: PMTD
[V7] Vendor specific: NMVIntel Corp
[V8] Vendor specific: L1D0
[V9] Vendor specific: LNK164163
[RV] Reserved: checksum good, 2 byte(s) reserved
Read/write fields:
[Y0] System specific: CCF1
End
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [150 v1] Device Serial Number 40-a6-b7-ff-ff-84-ef-ec
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
VF offset: 8, stride: 1, Device ID: 1889
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
Region 3: Memory at 0000000000000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
No steering table available
Capabilities: [1b0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [1d0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [200 v1] Data Link Feature <?>
Capabilities: [210 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [250 v1] Lane Margining at the Receiver <?>
Kernel modules: ice