Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT

From: Sebastian Andrzej Siewior
Date: Thu Jan 09 2025 - 12:45:28 EST


On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote:
> > If the issue is indeed the use of threaded interrupts then the fix
> > should not be limited to be PREEMPT_RT only.
> >
> Although I was not aware of this scenario, the patch should work for it as well,
> as I am forcing it to run in interrupt context. I will test it to confirm.

If I remember correctly there were "ifdef preempt_rt" things in it.

> > > > - What causes the failure? I see you reworked into two parts to behave
> > > > similar to what happens without threaded interrupts. There is still no
> > > > explanation for it. Is there a timing limit or was there another
> > > > register operation which removed the mailbox message?
> > > >
> > >
> > > I explained the root cause of the issue in the last commit. Maybe I should
> > > have added the explanation to the cover letter as well. Anyway, here is a
> > > partial verbatim copy of it:
> > >
> > > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > > ip link up command intermittently fails for the igbvf interfaces when
> > > using the PREEMPT_RT variant. Investigation revealed that
> > > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > > from e1000_poll_for_ack.
> >
> > That ACK would have come if it would poll longer?
> >
> No, the service wouldn't be serviced while polling.

Hmm.

> > > The underlying issue arises from the fact that IRQs are threaded by
> > > default under PREEMPT_RT. While the exact hardware details are not
> > > available, it appears that the IRQ handled by igb_msix_other must
> > > be processed before e1000_poll_for_ack times out. However,
> > > e1000_write_posted_mbx is called with preemption disabled, leading
> > > to a scenario where the IRQ is serviced only after the failure of
> > > e1000_write_posted_mbx."
> >
> > Where is this disabled preemption coming from? This should be one of the
> > ops.write_posted() calls, right? I've been looking around and don't see
> > anything obvious.
>
> I don't remember if I found the answer by looking at the code or by
> looking at the ftrace flags.
> I am currently on sick leave with covid. I can check it when I come back.

Don't worry, get better first. I'm kind of off myself. I'm not sure if I
have the hardware needed to setup so I can look at it…

Sebastian