Re: [PATCH 08/13] fs: add read support for RWF_UNCACHED

From: Stefan Metzmacher
Date: Mon Nov 11 2024 - 08:05:49 EST

Next message: Jeongjun Park: "Re: [PATCH] Remove unused function parameter in __smc_diag_dump"
Previous message: Roger Quadros: "Re: [PATCH net 1/2] net: ti: icssg-prueth: Fix firmware load sequence."
In reply to: Jens Axboe: "Re: [PATCH 08/13] fs: add read support for RWF_UNCACHED"
Next in thread: Jens Axboe: "Re: [PATCH 08/13] fs: add read support for RWF_UNCACHED"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Jens,

If the same test case is run with RWF_UNCACHED set for the buffered read,
the output looks as follows:

Reading bs 65536, uncached 0
1s: 153144MB/sec
2s: 156760MB/sec
3s: 158110MB/sec
4s: 158009MB/sec
5s: 158043MB/sec
6s: 157638MB/sec
7s: 157999MB/sec
8s: 158024MB/sec
9s: 157764MB/sec
10s: 157477MB/sec
11s: 157417MB/sec
12s: 157455MB/sec
13s: 157233MB/sec
14s: 156692MB/sec

which is just chugging along at ~155GB/sec of read performance. Looking
at top, we see:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7961 root 20 0 267004 0 0 S 3180 0.0 5:37.95 uncached
8024 axboe 20 0 14292 4096 0 R 1.0 0.0 0:00.13 top

where just the test app is using CPU, no reclaim is taking place outside
of the main thread. Not only is performance 65% better, it's also using
half the CPU to do it.

Do you have numbers of similar code using O_DIRECT just to
see the impact of the memcpy from the page cache to the userspace
buffer...

Thanks!
metze

Next message: Jeongjun Park: "Re: [PATCH] Remove unused function parameter in __smc_diag_dump"
Previous message: Roger Quadros: "Re: [PATCH net 1/2] net: ti: icssg-prueth: Fix firmware load sequence."
In reply to: Jens Axboe: "Re: [PATCH 08/13] fs: add read support for RWF_UNCACHED"
Next in thread: Jens Axboe: "Re: [PATCH 08/13] fs: add read support for RWF_UNCACHED"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]