Re: [RFC PATCH] dax: add badblocks check to Device DAX

From: Dave Jiang
Date: Wed May 03 2017 - 17:56:35 EST


On 05/03/2017 02:48 PM, Dan Williams wrote:
> On Wed, May 3, 2017 at 11:46 AM, Kani, Toshimitsu <toshi.kani@xxxxxxx> wrote:
>> On Wed, 2017-05-03 at 09:30 -0700, Dan Williams wrote:
>>> On Wed, May 3, 2017 at 9:09 AM, Kani, Toshimitsu <toshi.kani@xxxxxxx>
>>> wrote:
>>>> On Wed, 2017-05-03 at 08:52 -0700, Dan Williams wrote:
>>>>> On Wed, May 3, 2017 at 8:31 AM, Toshi Kani <toshi.kani@xxxxxxx>
>>>>> wrote:
>>>>>> This is a RFC patch for seeking suggestions. It adds support
>>>>>> of badblocks check in Device DAX by using region-level
>>>>>> badblocks list. This patch is only briefly tested.
>>>>>>
>>>>>> device_dax is a well-isolated self-contained module as it calls
>>>>>> alloc_dax() with dev_dax, which is private to device_dax. For
>>>>>> checking badblocks, it needs to call dax_pmem to check with
>>>>>> region-level badblocks.
>>>>>>
>>>>>> This patch attempts to keep device_dax self-contained. It adds
>>>>>> check_error() to dax_operations, and dax_check_error() as a
>>>>>> stub with *dev_dax and *dev pointers to convey it to
>>>>>> dax_pmem. I am wondering if this is the right direction, or we
>>>>>> should change the modularity to let dax_pmem call alloc_dax()
>>>>>> with its dax_pmem (or I completely missed something).
>>>>>
>>>>> The problem is that device-dax guarantees a given fault
>>>>> granularity. To make that guarantee we can't fallback from 1G or
>>>>> 2M mappings due to an error. We also can't reasonably go the
>>>>> other way and fail mappings that contain a badblock because that
>>>>> would change the blast radius of a media error to the fault size.
>>>>
>>>> Does it mean we expect users to have CPUs with MCE recovery for
>>>> Device DAX? Can we add an attributes like allow error-check &
>>>> fall-back?
>>>
>>> Yes, without MCE recovery device-dax mappings that consume errors
>>> will reboot. If an application needs the kernel protection it should
>>> be using filesystem-dax.
>>
>> Understood. Are we going to provide sysfs "badblocks" for Device DAX
>> as it is also needed for ndctl clear-error?
>
> No, I had started that way, but badblocks really needs write(2) or
> fallocate(PUNCH_HOLE) support for clearing errors. Since we don't want
> to support write(2) and were NAKd from supporting fallocate() the only
> interface that was left was sending clear-error-DSM ioctls directly to
> the nvdimm bus. Since that is a very libnvdimm specific interface it
> made sense to then add badblocks at the libnvdimm-region level. The
> "ndctl clear-error" command is there to do the translation of error
> offsets in user space and supersedes the need for the kernel to carry
> a badblocks file for device-dax.
>

Toshi, I'm also working on ndctl list-errors in relations to dev dax so
that you get a list of badblocks that are fixed up for dev dax.