Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences
From: Jared Hulbert
Date: Tue Feb 02 2016 - 16:46:16 EST
On Tue, Feb 2, 2016 at 8:51 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> On Tue, Feb 2, 2016 at 12:05 AM, Jared Hulbert <jaredeh@xxxxxxxxx> wrote:
> [..]
>> Well... as CONFIG_BLOCK was not required with filemap_xip.c for a
>> decade. This CONFIG_BLOCK dependency is a result of an incremental
>> feature from a certain point of view ;)
>>
>> The obvious 'driver' is physical RAM without a particular driver.
>> Remember please I'm talking about embedded. RAM measured in MiB and
>> funky one off hardware etc. In the embedded world there are lots of
>> ways that persistent memory has been supported in device specific ways
>> without the new fancypants NFIT and Intel instructions,so frankly
>> they don't fit in the PMEM stuff. Maybe they could be supported in
>> PMEM but not without effort to bring embedded players to the table.
>
> Not sure what you're trying to say here. An ACPI NFIT only feeds the
> generic libnvdimm device model. You don't need NFIT to get pmem.
Right... I'm just not seeing how the libnvdimm device model fits, is
relevant, or useful to a persistent SRAM in embedded. Therefore I
don't see some of the user will have a driver.
>> The other drivers are the MTD drivers, probably as read-only for now.
>> But the paradigm there isn't so different from what PMEM looks like
>> with asymmetric read/write capabilities.
>>
>> The filesystem I'm concerned with is AXFS
>> (https://www.kernel.org/doc/ols/2008/ols2008v1-pages-211-218.pdf).
>> Which I've been planning on trying to merge again due to a recent
>> resurgence of interest. The device model for AXFS is... weird. It
>> can use one or two devices at a time of any mix of NOR MTD, NAND MTD,
>> block, and unmanaged physical memory. It's a terribly useful model
>> for embedded. Anyway AXFS is readonly so hacking in a read only
>> dax_fault_nodev() and dax_file_read() would work fine, looks easy
>> enough. But... it would be cool if similar small embedded focused RW
>> filesystems were enabled.
>
> Are those also out of tree?
Of course. Merging embedded filesystems is little merging regular
filesystems except 98% of you reviewers don't want it merged.
>> I don't expect you to taint DAX with design requirements for this
>> stuff that it wasn't built for, nobody ends up happy in that case.
>> However, if enabling the filesystem to manage the bdev_direct_access()
>> interactions solves some of the "alternate device" problems you are
>> discussing here, then there is a chance we can accommodate both.
>> Sometimes that works.
>>
>> So... Forget CONFIG_BLOCK=n entirely I didn't want that to be the
>> focus anyway. Does it help to support the weirder XFS and btrfs
>> device models to enable the filesystem to handle the
>> bdev_direct_access() stuff?
>
> It's not clear that it does. We just clarified with xfs and ext4 that
> we can really on get_blocks(). That solves the immediate concern with
> multi-device filesystems.
IMO you're making DAX more complex by overly coupling to the bdev and
I think it could bite you later. I submit this rework of the radix
tree and confusion about where to get the real bdev as evidence. I'm
guessing that it won't be the last time. It's unnecessary to couple
it like this, and in fact is not how the vfs has been layered in the
past.
The trouble with vfs work has been that it straddles the line between
mm and block, unfortunately that line is dark chasm with ill defined
boundaries. DAX is even more exciting because it's trying to duct
tape the filesystem even closer to the mm system, one could argue it's
actually in some respects enabling the filesystem to bypass the mm
code. On top of that DAX is designed to enable block based
filesystems to use RAM like devices.
Bolting the block device interface on to NVDIMM is a brilliant hack
and the right design choice, but it's still a hack. The upside is it
enables the reuse of all this glorious legacy filesystem code which
does a pretty amazing job of handling what the pmem device
applications need considering they were designed to manage data on
platters of slow spinning rust. How would DAX look like developed
with a filesystem purpose built for pmem?
To look at the the downside consider dax_fault(). Its called on a
fault to a user memory map, uses the filesystems get_block() to lookup
a sector so you can ask a block device to convert it to an address on
a DIMM. Come on, that's awkward. Everything around dax_fault() is
dripping with memory semantic interfaces, the dax_fault() call are
fundamentally about memory, the pmem calls are memory, the hardware is
memory, and yet it directly calls bdev_direct_access(). It's out of
place.
The legacy vfs/mm code didn't have this layering problem either. Even
filemap_fault() that dax_fault() is modeled after doesn't call any
bdev methods directly, when it needs something it asks the filesystem
with a ->readpage(). The precedence is that you ask the filesystem
for what you need. Look at the get_bdev() thing you've concluded you
need. It _almost_ makes my point. I just happen to be of the opinion
that you don't actually want or need the bdev, you want the pfn/kaddr
so you can flush or map or memcpy().