Re: [RFC 2/2] KVM: add virtio-pmem driver
From: Dan Williams
Date: Mon Oct 16 2017 - 11:58:45 EST
On Mon, Oct 16, 2017 at 7:47 AM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote:
> On Fri, Oct 13, 2017 at 06:48:15AM -0400, Pankaj Gupta wrote:
>> > On Thu, Oct 12, 2017 at 09:20:26PM +0530, Pankaj Gupta wrote:
>> > > +static blk_qc_t virtio_pmem_make_request(struct request_queue *q,
>> > > + struct bio *bio)
>> > > +{
>> > > + blk_status_t rc = 0;
>> > > + struct bio_vec bvec;
>> > > + struct bvec_iter iter;
>> > > + struct virtio_pmem *pmem = q->queuedata;
>> > > +
>> > > + if (bio->bi_opf & REQ_FLUSH)
>> > > + //todo host flush command
>> >
>> > This detail is critical to the device design. What is the plan?
>>
>> yes, this is good point.
>>
>> was thinking of guest sending a flush command to Qemu which
>> will do a fsync on file fd.
>
> Previously there was discussion about fsyncing a specific file range
> instead of the whole file. This could perform better in cases where
> only a subset of dirty pages need to be flushed.
>
> One possibility is to design the virtio interface to communicate ranges
> but the emulation code simply fsyncs the fd for the time being. Later
> on, if the necessary kernel and userspace interfaces are added, we can
> make use of the interface.
Range based is not a natural storage cache management mechanism. All
that is it available typically is a full write-cache-flush mechanism
and upper layers would need to customized for range-based flushing.
>> If we do a async flush and move the task to wait queue till we receive
>> flush complete reply from host we can allow other tasks to execute
>> in current cpu.
>>
>> Any suggestions you have or anything I am not foreseeing here?
>
> My main thought about this patch series is whether pmem should be a
> virtio-blk feature bit instead of a whole new device. There is quite a
> bit of overlap between the two.
I'd be open to that... there's already provisions in the pmem driver
for platforms where cpu caches are flushed on power-loss, a virtio
mode for this shared-memory case seems reasonable.