Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock
From: David Laight
Date: Sun Feb 02 2025 - 08:55:15 EST
On Sat, 1 Feb 2025 22:00:06 +0000
Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> On Sat, Feb 01, 2025 at 09:51:05PM +0000, David Laight wrote:
> > I'm not sure what you mean.
> > Disabling interrupts isn't as cheap as it ought to be, but probably isn't
> > that bad.
>
> Time it. You'll see.
The best scheme I've seen is to just increment a per-cpu value.
Let the interrupt happen, notice it isn't allowed and return with
interrupts disabled.
Then re-issue the interrupt when the count is decremented to zero.
Easy with level sensitive interrupts.
But I don't think Linux ever uses that scheme.
> > > So while this is indeed a tradeoff, as I understand the sane default
> > > is to *not* disable interrupts unless necessary.
> >
> > I bet to differ.
>
> You're wrong. It is utterly standard to take spinlocks without
> disabling IRQs. We do it all over the kernel. If you think that needs
> to change, then make your case, don't throw a driveby review.
>
> And I don't mean by arguing. Make a change, measure the difference.
The analysis was done on some userspace code that basically does:
for (;;) {
pthread_mutex_enter(lock);
item = get_head(list);
if (!item)
break;
pthead_mutex_exit(lock);
process(item);
}
For the test there were about 10000 items on the list and 30 threads
processing it (that was the target of the tests).
The entire list needs to be processed in 10ms (RTP audio).
There was a bit more code with the mutex held, but only 100 or so
instructions.
Mostly it works fine, some threads get delayed by interrupts (etc) but
the other threads carry on working and all the items get processed.
However sometimes an interrupt happens while the mutex is held.
In that case the other 29 threads get stuck waiting for the mutex.
No progress is made until the interrupt completes and it overruns
the 10ms period.
While this is a userspace test, the same thing will happen with
spin locks in the kernel.
In userspace you can't disable interrupts, but for kernel spinlocks
you can.
The problem is likely to show up as unexpected latency affecting
code with a hot mutex that is only held for short periods while
running a lot of network traffic.
That is also latency that affects all cpu at the same time.
The interrupt itself will always cause latency to one cpu.
Note that I also had to enable RFS, threaded NAPI and move the NAPI
threads to RT priorities to avoid lost packets.
The fix was to replace the linked list with an array and use atomic
increment to get the index of the item to process.
David