Re: [PATCH 1/1] iommu: Avoid races around default domain allocations

From: Robin Murphy
Date: Wed Feb 07 2024 - 19:05:32 EST


On 2024-02-07 2:56 pm, Jason Gunthorpe wrote:
On Wed, Feb 07, 2024 at 07:56:25PM +0530, Nikhil V wrote:


On 2/1/2024 9:53 PM, Jason Gunthorpe wrote:
On Mon, Jan 29, 2024 at 01:29:12PM +0530, Nikhil V wrote:

Gentle ping to have your valuable feedback. This fix is helping us
downstream without which we see a bunch of kernel crashes.

What are you expecting here? This was fixed in Linus's tree some time
ago now

Are you asking for the stable team to put something weird in 6.1? I
don't think they generally do that?

Jason


Hi @Jason,

Considering that the issue is reported on 6.1, which is an __LTS kernel__,
any suggestion to fix this issue cleanly would help us a lot. Right thing
here would have been propagating the changes from 6.6 (like for any
stability issue), but considering the intrusiveness of them, is it even
possible?

Just to be open about reproducibility of the issue, a bunch of them are
reported, both internally and by customers.

I think you need to talk to the stable maintainers not the iommu
upstream folks. I don't well know their policy.

Frankly, I'd suggest just proposing the necessary (and tested)
upstream patches to 6.1, however large they are, and see what Greg and
Sasha say. This is the usual working model they have, as I understand
it.

To be blunt, hell no. Stable is far enough from its namesake already; the ongoing bordering-on-ridiculous brokenness of your mainline changes where each "fix" keeps affecting something else is a massive NAK to backporting any of it, let alone 43+ patches to achieve a 2-line fix.

Nikhil, if this is truly sufficient to resolve the issues you see (AFAICS things end up serialised by the group mutex so probably should be robust enough), then I'm OK with you proposing it as a dedicated stable-only fix, as an "equivalent" patch per Option 3 of stable-kernel-rules.rst - I reckon your commit message is already pretty good with regards to the final point there, but I'll be happy to help argue the case if necessary. Just one point - is it genuinely not relevant to 5.15 and earlier or is it just the case that 6.1 is the oldest thing you're actively testing? (Apologies, I've already forgotten where things were that far back).

That said, I also don't think there would be any harm in applying this to mainline as a belt-and-braces thing either, if it helps makes a backport easier and Joerg doesn't mind. There's already a bunch of stuff I'll be cleaning up once the underlying issue behind all of this is properly fixed, so adding a couple more lines of code to that list is no big deal as far as I'm concerned.

Thanks,
Robin.