Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

From: Ingo Molnar
Date: Thu May 07 2015 - 05:02:30 EST



* Dan Williams <dan.j.williams@xxxxxxxxx> wrote:

> > What is the primary thing that is driving this need? Do we have a
> > very concrete example?
>
> My pet concrete example is covered by __pfn_t. Referencing
> persistent memory in an md/dm hierarchical storage configuration.
> Setting aside the thrash to get existing block users to do
> "bvec_set_page(page)" instead of "bvec->page = page" the onus is on
> that md/dm implementation and backing storage device driver to
> operate on __pfn_t. That use case is simple because there is no use
> of page locking or refcounting in that path, just dma_map_page() and
> kmap_atomic(). The more difficult use case is precisely what Al
> picked up on, O_DIRECT and RDMA. This patchset does nothing to
> address those use cases outside of not needing a struct page when
> they eventually craft a bio.

So why not do a dual approach?

There are code paths where the 'pfn' of a persistent device is mostly
used as a sector_t equivalent of terabytes of storage, not as an index
of a memory object.

It's not an address to a cache, it's an index into a huge storage
space - which happens to be (flash) RAM. For them using pfn_t seems
natural and using struct page * is a strained (not to mention
expensive) model.

For more complex facilities, where persistent memory is used as a
memory object, especially where the underlying device is true,
unfinitely writable RAM (not flash), treating it as a memory zone, or
setting up dynamic struct page would be the natural approach. (with
the inevitable cost of setup/teardown in the latter case)

I'd say that for anything where the dynamic struct page is torn down
unconditionally after completion of only a single use, the natural API
is probably pfn_t, not struct page. Any synchronization is already
handled at the block request layer already, and it's storage op
synchronization, not memory access synchronization really.

For anything more complex, that maps any of this storage to
user-space, or exposes it to higher level struct page based APIs,
etc., where references matter and it's more of a cache with
potentially multiple users, not an IO space, the natural API is struct
page.

I'd say that this particular series mostly addresses the 'pfn as
sector_t' side of the equation, where persistent memory is IO space,
not memory space, and as such it is the more natural and thus also the
cheaper/faster approach.

Linus probably disagrees? :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/