Re: [PATCH v2 0/5] Introduce DMA_HEAP_ALLOC_AND_READ_FILE heap flag

From: Huan Yang
Date: Wed Jul 31 2024 - 22:54:12 EST

Next message: Peng Fan: "RE: [PATCH 01/14] arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges"
Previous message: Cristian Ciocaltea: "[PATCH v2 2/3] drm/rockchip: Explicitly include bits header"
In reply to: Daniel Vetter: "Re: [PATCH v2 0/5] Introduce DMA_HEAP_ALLOC_AND_READ_FILE heap flag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

在 2024/8/1 4:46, Daniel Vetter 写道:

On Tue, Jul 30, 2024 at 08:04:04PM +0800, Huan Yang wrote:

在 2024/7/30 17:05, Huan Yang 写道:

在 2024/7/30 16:56, Daniel Vetter 写道:

[????????? daniel.vetter@xxxxxxxx ?????????
https://aka.ms/LearnAboutSenderIdentification?????????????]

On Tue, Jul 30, 2024 at 03:57:44PM +0800, Huan Yang wrote:

UDMA-BUF step:
   1. memfd_create
   2. open file(buffer/direct)
   3. udmabuf create
   4. mmap memfd
   5. read file into memfd vaddr

Yeah this is really slow and the worst way to do it. You absolutely want
to start _all_ the io before you start creating the dma-buf, ideally
with
everything running in parallel. But just starting the direct I/O with
async and then creating the umdabuf should be a lot faster and avoid

That's greate, Let me rephrase that, and please correct me if I'm wrong.

UDMA-BUF step:
1. memfd_create
2. mmap memfd
3. open file(buffer/direct)
4. start thread to async read
3. udmabuf create

With this, can improve

I just test with it. Step is:

UDMA-BUF step:
1. memfd_create
2. mmap memfd
3. open file(buffer/direct)
4. start thread to async read
5. udmabuf create

6 . join wait

3G file read all step cost 1,527,103,431ns, it's greate.

Ok that's almost the throughput of your patch set, which I think is close
enough. The remaining difference is probably just the mmap overhead, not
sure whether/how we can do direct i/o to an fd directly ... in principle
it's possible for any file that uses the standard pagecache.

Yes, for mmap, IMO, now that we get all folios and pin it. That's mean all pfn it's got when udmabuf created.

So, I think mmap with page fault is helpless for save memory but increase the mmap access cost.(maybe can save a little page table's memory)

I want to offer a patchset to remove it and more suitable for folios operate(And remove unpin list). And contains some fix patch.

I'll send it when I test it's good.

About fd operation for direct I/O, maybe use sendfile or copy_file_range?

sendfile base pipe buffer, it's low performance when I test is.

copy_file_range can't work due to it's not the same file system.

So, I can't find other way to do it. Can someone give some suggestions?

-Sima

Next message: Peng Fan: "RE: [PATCH 01/14] arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges"
Previous message: Cristian Ciocaltea: "[PATCH v2 2/3] drm/rockchip: Explicitly include bits header"
In reply to: Daniel Vetter: "Re: [PATCH v2 0/5] Introduce DMA_HEAP_ALLOC_AND_READ_FILE heap flag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]