Re: [PATCH v2 2/7] dax: change bdev_dax_supported() to support boolean returns

From: Dan Williams
Date: Mon Jun 04 2018 - 19:40:33 EST


On Sun, Jun 3, 2018 at 6:48 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> On Sun, Jun 3, 2018 at 5:25 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> On Mon, Jun 04, 2018 at 08:20:38AM +1000, Dave Chinner wrote:
>>> On Thu, May 31, 2018 at 09:02:52PM -0700, Dan Williams wrote:
>>> > On Thu, May 31, 2018 at 7:24 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>> > > On Thu, May 31, 2018 at 06:57:33PM -0700, Dan Williams wrote:
>>> > >> > FWIW, XFS+DAX used to just work on this setup (I hadn't even
>>> > >> > installed ndctl until this morning!) but after changing the kernel
>>> > >> > it no longer works. That would make it a regression, yes?
>>>
>>> [....]
>>>
>>> > >> I suspect your kernel does not have CONFIG_ZONE_DEVICE enabled which
>>> > >> has the following dependencies:
>>> > >>
>>> > >> depends on MEMORY_HOTPLUG
>>> > >> depends on MEMORY_HOTREMOVE
>>> > >> depends on SPARSEMEM_VMEMMAP
>>> > >
>>> > > Filesystem DAX now has a dependency on memory hotplug?
>>>
>>> [....]
>>>
>>> > > OK, works now I've found the magic config incantantions to turn
>>> > > everything I now need on.
>>>
>>> By enabling these options, my test VM now has a ~30s pause in the
>>> boot very soon after the nvdimm subsystem is initialised.
>>>
>>> [ 1.523718] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>> [ 1.550353] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
>>> [ 1.552175] Non-volatile memory driver v1.3
>>> [ 2.332045] tsc: Refined TSC clocksource calibration: 2199.909 MHz
>>> [ 2.333280] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fb5dcd4620, max_idle_ns: 440795264143 ns
>>> [ 37.217453] brd: module loaded
>>> [ 37.225423] loop: module loaded
>>> [ 37.228441] virtio_blk virtio2: [vda] 10485760 512-byte logical blocks (5.37 GB/5.00 GiB)
>>> [ 37.245418] virtio_blk virtio3: [vdb] 146800640 512-byte logical blocks (75.2 GB/70.0 GiB)
>>> [ 37.255794] virtio_blk virtio4: [vdc] 1073741824000 512-byte logical blocks (550 TB/500 TiB)
>>> [ 37.265403] nd_pmem namespace1.0: unable to guarantee persistence of writes
>>> [ 37.265618] nd_pmem namespace0.0: unable to guarantee persistence of writes
>>>
>>> The system does not appear to be consuming CPU, but it is blocking
>>> NMIs so I can't get a CPU trace. For a VM that I rely on booting in
>>> a few seconds because I reboot it tens of times a day, this is a
>>> problem....
>>
>> And when I turn on KASAN, the kernel fails to boot to a login prompt
>> because:
>
> What's your qemu and kernel command line? I'll take look at this first
> thing tomorrow.

I was able to reproduce this crash by just turning on KASAN...
investigating. It would still help to have your config for our own
regression testing purposes it makes sense for us to prioritize
"Dave's test config", similar to the priority of not breaking Linus'
laptop.