Re: [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention)
From: Jens Axboe
Date: Wed Jan 29 2025 - 12:46:41 EST
On 1/29/25 10:39 AM, Max Kellermann wrote:
> On Wed, Jan 29, 2025 at 6:19?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> The other patches look pretty straight forward to me. Only thing that
>> has me puzzled a bit is why you have so much io-wq activity with your
>> application, in general I'd expect 0 activity there. But Then I saw the
>> forced ASYNC flag, and it makes sense. In general, forcing that isn't a
>> great idea, but for a benchmark for io-wq it certainly makes sense.
>
> I was experimenting with io_uring and wanted to see how much
> performance I can squeeze out of my web server running
> single-threaded. The overhead of io_uring_submit() grew very large,
> because the "send" operation would do a lot of synchronous work in the
> kernel. I tried SQPOLL but it was actually a big performance
> regression; this just shifted my CPU usage to epoll_wait(). Forcing
> ASYNC gave me large throughput improvements (moving the submission
> overhead to iowq), but then the iowq lock contention was the next
> limit, thus this patch series.
>
> I'm still experimenting, and I will certainly revisit SQPOLL to learn
> more about why it didn't help and how to fix it.
Why are you combining it with epoll in the first place? It's a lot more
efficient to wait on a/multiple events in io_uring_enter() rather than
go back to a serialize one-event-per-notification by using epoll to wait
on completions on the io_uring side.
--
Jens Axboe