Re: [PATCH v11 21/21] brd: Rename XIP to DAX

From: Dave Chinner
Date: Tue Mar 24 2015 - 23:25:29 EST

Next message: Varka Bhadram: "Re: [PATCH v4 3/3] leds: Add ktd2692 flash LED driver"
Previous message: Andi Kleen: "Re: [PATCH 1/2] Support PCU power metrics in turbostat"
In reply to: Matt Mullins: "Re: [PATCH v11 21/21] brd: Rename XIP to DAX"
Next in thread: Matthew Wilcox: "Should implementations of ->direct_access be allowed to sleep?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Mar 24, 2015 at 11:50:47AM -0700, Matt Mullins wrote:
> On Thu, Sep 25, 2014 at 04:33:38PM -0400, Matthew Wilcox wrote:
> > --- a/drivers/block/brd.c
> > +++ b/drivers/block/brd.c
> > @@ -97,13 +97,13 @@ static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
> > * Must use NOIO because we don't want to recurse back into the
> > * block or filesystem layers from page reclaim.
> > *
> > - * Cannot support XIP and highmem, because our ->direct_access
> > - * routine for XIP must return memory that is always addressable.
> > - * If XIP was reworked to use pfns and kmap throughout, this
> > + * Cannot support DAX and highmem, because our ->direct_access
> > + * routine for DAX must return memory that is always addressable.
> > + * If DAX was reworked to use pfns and kmap throughout, this
> > * restriction might be able to be lifted.
> > */
> > gfp_flags = GFP_NOIO | __GFP_ZERO;
> > -#ifndef CONFIG_BLK_DEV_XIP
> > +#ifndef CONFIG_BLK_DEV_RAM_DAX
> > gfp_flags |= __GFP_HIGHMEM;
> > #endif
> > page = alloc_page(gfp_flags);
>
> We're also developing a user of direct_access, and we ended up with some
> questions about the sleeping guarantees of the direct_access API.
>
> Since brd is currently the only (x86) implementation of DAX in Linus's tree,
> I've been testing against that. We noticed that the brd implementation of DAX
> can call into alloc_page() with __GFP_WAIT if we call direct_access() on a page
> that has not yet been allocated. This is compounded by the fact that brd does
> not support size > PAGE_SIZE (and thus I call bdev_direct_access() on each use),
> though the limitation makes sense -- I shouldn't expect the brd driver to be
> able to allocate a gigabyte of contiguous memory.
>
> The potential sleeping behavior was somewhat surprising to me, as I would expect
> the NV-DIMM device implementation to simply offset the pfn at which the device
> is located rather than perform a memory allocation. What are the guaranteed
> and/or expected contexts from which direct_access() can be safely called?

I'll defer to whatever Willy and others say, but I my understanding
is that .direct_access has the same semantics as submitting an IO.
i.e. the intent of .direct_access is to set up direct access to the
memory and then return a pfn you can use to access it and hence what
operations are performed are backing device dependent.

Hence for some devices it might simply be an offset->pfn calculation
and immediately return, others might have to play mapping games
(maybe talk to an iommu?) and others, like brd, may have to allocate
backing store from some separate storage pool before access can be
granted. Expect that .direct_access can sleep, and you'll be fine.

> While I can easily punt this usage to a work queue (that's what we already do
> for devices where we need to submit a bio), part of our desire to use
> direct_access is to avoid additional latency.

brd is intended for testing purposes only because it isn't
persistent. However, we need something we can develop against and
imost of us don't have real hardware - that's what brd+dax is for.

If you want brd+dax to act like NVDIMM based persistent memory,
populate all the backing pages in the ram disk before running your
tests by writing zeros to the entire block device. Then the backing
store will be fully allocated, and .direct_access will never do
allocation until you flush the backing store...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Varka Bhadram: "Re: [PATCH v4 3/3] leds: Add ktd2692 flash LED driver"
Previous message: Andi Kleen: "Re: [PATCH 1/2] Support PCU power metrics in turbostat"
In reply to: Matt Mullins: "Re: [PATCH v11 21/21] brd: Rename XIP to DAX"
Next in thread: Matthew Wilcox: "Should implementations of ->direct_access be allowed to sleep?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]