Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices

From: Mike Snitzer
Date: Tue Jun 14 2016 - 21:47:24 EST


On Tue, Jun 14 2016 at 4:19pm -0400,
Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:

> Mike Snitzer <snitzer@xxxxxxxxxx> writes:
>
> > On Tue, Jun 14 2016 at 9:50am -0400,
> > Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> >
> >> "Kani, Toshimitsu" <toshi.kani@xxxxxxx> writes:
> >>
> >> >> I had dm-linear and md-raid0 support on my list of things to look at,
> >> >> did you have raid0 in your plans?
> >> >
> >> > Yes, I hope to extend further and raid0 is a good candidate.   
> >>
> >> dm-flakey would allow more xfstests test cases to run. I'd say that's
> >> more important than linear or raid0. ;-)
> >
> > Regardless of which target(s) grow DAX support the most pressing initial
> > concern is getting the DM device stacking correct. And verifying that
> > IO that cross pmem device boundaries are being properly split by DM
> > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to
> > max_io_len).
>
> That was a tongue-in-cheek comment. You're reading way too much into
> it.
>
> >> Also, the next step in this work is to then decide how to determine on
> >> what numa node an LBA resides. We had discussed this at a prior
> >> plumbers conference, and I think the consensus was to use xattrs.
> >> Toshi, do you also plan to do that work?
> >
> > How does the associated NUMA node relate to this? Does the
> > DM requests_queue need to be setup to only allocate from the NUMA node
> > the pmem device is attached to? I recently added support for this to
> > DM. But there will likely be some code need to propagate the NUMA node
> > id accordingly.
>
> I assume you mean allocate memory (the volatile kind). That should work
> the same between pmem and regular block devices, no?

This is the commit I made to train DM to be numa node aware:
115485e83f497fdf9b4 ("dm: add 'dm_numa_node' module parameter")

As is the DM code is focused on memory allocations. But I think blk-mq
may use the NUMA node for via tag_set->numa_node. But that is moot
given pmem is bio-based right?

Steps could be taken to make all threads DM creates for a a given device
get pinned to the specified NUMA node too.