Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect.

From: Toke Høiland-Jørgensen
Date: Thu Jan 18 2024 - 06:58:29 EST


Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:

> On 2024-01-17 17:37:29 [+0100], Toke Høiland-Jørgensen wrote:
>> This is all back-of-the-envelope calculations, of course. Having some
>> actual numbers to look at would be great; I don't suppose you have a
>> setup where you can run xdp-bench and see how your patches affect the
>> throughput?
>
> No but I probably could set it up.

That would be great! Feel free to ping me if you need any pointers to
how we usually do the perf measurements :)

>> I chatted with Jesper about this, and he had an idea not too far from
>> this: split up the XDP and regular stack processing in two stages, each
>> with their individual batching. So whereas right now we're doing
>> something like:
>>
>> run_napi()
>> bh_disable()
>> for pkt in budget:
>> act = run_xdp(pkt)
>> if (act == XDP_PASS)
>> run_netstack(pkt) // this is the expensive bit
>> bh_enable()
>>
>> We could instead do:
>>
>> run_napi()
>> bh_disable()
>> for pkt in budget:
>> act = run_xdp(pkt)
>> if (act == XDP_PASS)
>> add_to_list(pkt, to_stack_list)
>> bh_enable()
>> // sched point
>> bh_disable()
>> for pkt in to_stack_list:
>> run_netstack(pkt)
>> bh_enable()
>>
>>
>> This would limit the batching that blocks everything to only the XDP
>> processing itself, which should limit the maximum time spent in the
>> blocking state significantly compared to what we have today. The caveat
>> being that rearranging things like this is potentially a pretty major
>> refactoring task that needs to touch all the drivers (even if some of
>> the logic can be moved into the core code in the process). So not really
>> sure if this approach is feasible, TBH.
>
> This does not work because bh_disable() does not disable scheduling.
> Scheduling may happen. bh_disable() acquires a lock which is currently
> the only synchronisation point between two say network driver doing
> NAPI. And this what I want to get rid of.
> Regarding expensive bit as in XDP_PASS: This doesn't need locking as per
> proposal, just the REDIRECT piece.

Right, well s/bh_disable()/lock()/; my main point was splitting up the
processing so that the XDP processing itself and the stack activation on
XDP_PASS is not interleaved. This will make it possible to hold the lock
around the whole XDP batch, not just individual packets, and so retain
the performance we gain from amortising expensive operations over
multiple packets.

-Toke