On Tue, Jul 30, 2024 at 08:04:04PM +0800, Huan Yang wrote:
在 2024/7/30 17:05, Huan Yang 写道:Ok that's almost the throughput of your patch set, which I think is close
在 2024/7/30 16:56, Daniel Vetter 写道:I just test with it. Step is:
[????????? daniel.vetter@xxxxxxxx ?????????That's greate, Let me rephrase that, and please correct me if I'm wrong.
https://aka.ms/LearnAboutSenderIdentification?????????????]
On Tue, Jul 30, 2024 at 03:57:44PM +0800, Huan Yang wrote:
UDMA-BUF step:Yeah this is really slow and the worst way to do it. You absolutely want
1. memfd_create
2. open file(buffer/direct)
3. udmabuf create
4. mmap memfd
5. read file into memfd vaddr
to start _all_ the io before you start creating the dma-buf, ideally
with
everything running in parallel. But just starting the direct I/O with
async and then creating the umdabuf should be a lot faster and avoid
UDMA-BUF step:
1. memfd_create
2. mmap memfd
3. open file(buffer/direct)
4. start thread to async read
3. udmabuf create
With this, can improve
UDMA-BUF step:
1. memfd_create
2. mmap memfd
3. open file(buffer/direct)
4. start thread to async read
5. udmabuf create
6 . join wait
3G file read all step cost 1,527,103,431ns, it's greate.
enough. The remaining difference is probably just the mmap overhead, not
sure whether/how we can do direct i/o to an fd directly ... in principle
it's possible for any file that uses the standard pagecache.
-Sima