Should implementations of ->direct_access be allowed to sleep?

From: Matthew Wilcox
Date: Thu Mar 26 2015 - 13:09:49 EST


On Tue, Mar 24, 2015 at 11:50:47AM -0700, Matt Mullins wrote:
> We're also developing a user of direct_access, and we ended up with some
> questions about the sleeping guarantees of the direct_access API.

That's a great question. Since DAX can always sleep when it's calling
into bdev_direct_access(), I hadn't thought about it (DAX is basically
called to handle page faults and do I/O; both of which are expected
to sleep).

> Since brd is currently the only (x86) implementation of DAX in Linus's tree,
> I've been testing against that. We noticed that the brd implementation of DAX
> can call into alloc_page() with __GFP_WAIT if we call direct_access() on a page
> that has not yet been allocated. This is compounded by the fact that brd does
> not support size > PAGE_SIZE (and thus I call bdev_direct_access() on each use),
> though the limitation makes sense -- I shouldn't expect the brd driver to be
> able to allocate a gigabyte of contiguous memory.
>
> The potential sleeping behavior was somewhat surprising to me, as I would expect
> the NV-DIMM device implementation to simply offset the pfn at which the device
> is located rather than perform a memory allocation. What are the guaranteed
> and/or expected contexts from which direct_access() can be safely called?

Yes, for 'real' NV-DIMM devices, as you can see by the ones in tree,
as well as the pmem driver that Ross has been posting, it's a simple
piece of arithmetic. The question is whether we should make all users
of ->direct_access accommodate brd, or whether we should change brd so
that it doesn't sleep.

I'm leaning towards the latter. But I'm not sure what GFP flags to
recommend that brd use ... GFP_NOWAIT | __GFP_ZERO, perhaps?

> If it would make more sense for us to test against (for example) the pmem or an
> mtd-block driver instead, as you've discussed with Mathieu Desnoyers, then I'd
> be happy to work with those in our environment as well.

I use Ross's pmem driver for my testing mostly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/