Re: [PATCH v6 18/21] nd_btt: atomic sector updates

From: Dan Williams
Date: Thu Jun 11 2015 - 16:22:55 EST


On Thu, Jun 11, 2015 at 1:11 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> From: Vishal Verma <vishal.l.verma@xxxxxxxxxxxxxxx>
>
> BTT stands for Block Translation Table, and is a way to provide power
> fail sector atomicity semantics for block devices that have the ability
> to perform byte granularity IO. It relies on the capability of libnvdimm
> namespace devices to do byte aligned IO.
>
> The BTT works as a stacked blocked device, and reserves a chunk of space
> from the backing device for its accounting metadata. It is a bio-based
> driver because all IO is done synchronously, and there is no queuing or
> asynchronous completions at either the device or the driver level.
>
> The BTT uses 'lanes' to index into various 'on-disk' data structures,
> and lanes also act as a synchronization mechanism in case there are more
> CPUs than available lanes. We did a comparison between two lane lock
> strategies - first where we kept an atomic counter around that tracked
> which was the last lane that was used, and 'our' lane was determined by
> atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
> theoretically, no CPU would be blocked waiting for a lane. The other
> strategy was to use the cpu number we're scheduled on to and hash it to
> a lane number. Theoretically, this could block an IO that could've
> otherwise run using a different, free lane. But some fio workloads
> showed that the direct cpu -> lane hash performed faster than tracking
> 'last lane' - my reasoning is the cache thrash caused by moving the
> atomic variable made that approach slower than simply waiting out the
> in-progress IO. This supports the conclusion that the driver can be a
> very simple bio-based one that does synchronous IOs instead of queuing.
>

Copy / paste error... following 2 paragraphs were from the previous
version of this patch and will be deleted when pushing upstream.

> BTT stands for Block Translation Table, and is a way to provide power
> fail sector atomicity semantics for block devices that have the ability
> to perform byte granularity IO. It relies on the ->rw_bytes() capability
> of libnd namespace devices.
>
> The BTT works as a stacked blocked device, and reserves a chunk of space
> from the backing device for its accounting metadata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/