Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

From: Christoph Hellwig
Date: Wed Jun 26 2019 - 02:57:44 EST

On Tue, Jun 25, 2019 at 01:54:21PM -0600, Logan Gunthorpe wrote:
> Well whether it's dma_addr_t, phys_addr_t, pfn_t the result isn't all
> that different. You still need roughly the same 'if' hooks for any
> backed memory that isn't in the linear mapping and you can't get a
> kernel mapping for directly.
> It wouldn't be too hard to do a similar patch set that uses something
> like phys_addr_t instead and have a request and queue flag for support
> of non-mappable memory. But you'll end up with very similar 'if' hooks
> and we'd have to clean up all bio-using drivers that access the struct
> pages directly.

We'll need to clean that mess up anyway, and I've been chugging
along doing some of that. A lot still assume no highmem, so we need
to convert them over to something that kmaps anyway. If we get
the abstraction right that will actually help converting over to
a better reprsentation.

> Though, we'd also still have the problem of how to recognize when the
> address points to P2PDMA and needs to be translated to the bus offset.
> The map-first inversion was what helped here because the driver
> submitting the requests had all the information. Though it could be
> another request flag and indicating non-mappable memory could be a flag

The assumes the request all has the same memory, which is a simplifing
assuption. My idea was that if had our new bio_vec like this:

struct bio_vec {
phys_addr_t paddr; // 64-bit on 64-bit systems
unsigned long len;

we have a hole behind len where we could store flag. Preferably
optionally based on a P2P or other magic memory types config
option so that 32-bit systems with 32-bit phys_addr_t actually
benefit from the smaller and better packing structure.

> If you think any of the above ideas sound workable I'd be happy to try
> to code up another prototype.

Ðt sounds workable. To some of the first steps are cleanups independent
of how the bio_vec is eventually going to look like. That is making
the DMA-API internals work on the phys_addr_t, which also unifies the
map_resource implementation with map_page. I plan to do that relatively
soon. The next is sorting out access to bios data by virtual address.
All these need nice kmapping helper that avoid too much open coding.
I was going to look into that next, mostly to kill the block layer
bounce buffering code. Similar things will also be needed at the
scatterlist level I think. After that we need to more audits of
how bv_page is still used. something like a bv_phys() helper that
does "page_to_phys(bv->bv_page) + bv->bv_offset" might come in handy
for example.