Re: [PATCH net-next v25 00/13] Device Memory TCP

From: Yunsheng Lin
Date: Tue Sep 10 2024 - 06:59:49 EST


On 2024/9/10 0:54, Mina Almasry wrote:
> On Mon, Sep 9, 2024 at 4:21 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>>
>> On 2024/9/9 13:43, Mina Almasry wrote:
>>
>>>
>>> Perf - page-pool benchmark:
>>> ---------------------------
>>>
>>> bench_page_pool_simple.ko tests with and without these changes:
>>> https://pastebin.com/raw/ncHDwAbn
>>>
>>> AFAIK the number that really matters in the perf tests is the
>>> 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
>>> cycles without the changes but there is some 1 cycle noise in some
>>> results.
>>>
>>> With the patches this regresses to 9 cycles with the changes but there
>>> is 1 cycle noise occasionally running this test repeatedly.
>>>
>>> Lastly I tried disable the static_branch_unlikely() in
>>> netmem_is_net_iov() check. To my surprise disabling the
>>> static_branch_unlikely() check reduces the fast path back to 8 cycles,
>>> but the 1 cycle noise remains.
>>
>> Sorry for the late report, as I was adding a testing page_pool ko basing
>> on [1] to avoid introducing performance regression when fixing the bug in
>> [2].
>> I used it to test the performance impact of devmem patchset for page_pool
>> too, it seems there might be some noticable performance impact quite stably
>> for the below testcases, about 5%~16% performance degradation as below in
>> the arm64 system:
>>
>
> Correct me if I'm wrong here, but on the surface here it seems that
> you're re-reporting a known issue. Consensus seems to be that it's a
> non-issue.
>
> In v6 I reported that the bench_page_pool_simple.ko test reports a 1
> cycle regression with these patches, from 8->9 cycles. That is roughly
> consistent with the 5-15% you're reporting.

>From the description above in the cover letter, I thought the performance
data using the out of tree testing ko is not stable enough to justify the
performance impact.

>
> I root caused the reason for the regression to be the
> netmem_is_net_iov() check in the fast path. I removed this regression
> in v7 (see the change log) by conditionally compiling the check in
> that function.
>
> In v8, Pavel/Jens/David pushed back on the ifdef check. See this
> entire thread, but in particular this response from Jens:

It seemed the main objection is about how to enable this feature
for the io_uring?

And it seemed that you had added the CONFIG_NET_DEVMEM for this
devmem thing, why not use it for that?

>
> https://lore.kernel.org/lkml/11f52113-7b67-4b45-ba1d-29b070050cec@xxxxxxxxx/
>
> Seems consensus that it's 'not really worth it in this scenario'.

I was only reading through the above thread, it didn't seemed to
reach to consensus as Jesper pointed out the performance impact
for the XDP DROP case in the same thread.

https://lore.kernel.org/lkml/779b9542-4170-483a-af54-ca0dd471f774@xxxxxxxxxx/

>